ERIC 



DOCUMENT RESUME 



IR Oil 101 

Mobray, Deborah, Ed. 

CPE — A New Perspective; The Impact of the Technology 
Revolution. Proceedings of the Computer Performance 
Evaluation Users Group Meeting (19th, San Francisco, 
California, October 25-28, 1983 ). Final Report . 
Reports on Computer Science and Technology. 
National Bureau of Standards (DO£) , Washington, D.C. 
Inst, for Computer Sciences and Technology. 
NBS-SP-500-104 \ v 
Oct 83 \ 
239p. 

Superintendent of Documents, U.S. Government Printing 
Office, Washington, DC 20402 ($6.50). 
Collected Works - Conference Proceedings (021) — 
Viewpdints (120) — Reports - Research/Technical 
(143) 

MF01/PC10 Plus Postage. 

Computers; *Computer Software; Cost Estimates; *Data 
Processing; *Evaluation; Federal Government; 
Information Centers; Information Networks; 
♦Microcomputers; Models; *Perf ormahce; Program 
Administration; Program Improvement; Quality Control; 
Statistical Analysis; Telecommunications 
♦Computer Performance Evaluation; Office Automation; 
Packet Switched Networks; Software Maintenance; 
♦Users / 

ABSTRACT 

Papers on local area networks (LANs) , modelling 
techniques, software improvement, capacity planning, software 
engineering, microcomputers and/ end user computing, cost accounting 
and chargeback, configuration and performance management, and 
benchmarking presented at this conference include: (1) "Theoretical 
Performance Analysis of Virtual Circuit LAN Sliding Window Flow 
Control," by E. Arthurs and others; (2) "Modelling and Monitoring 'a 
LAN, One Experience," by W. Bruce\Watson; (3) "Queue Length 
Characteristics at Very Fast, Constant Service Time Merger Nodes," by 
Chaim Ziegler; (4) "The Application of Multivariate Statistical 
Techniques to Computer Performance Evaluation Using Simulated Data," 
by Thomas C, Hartrum; (5) "Improving the Accuracy of a 
Working-Set-Oriented Generative Model of Program Behavior," by 
Domenico Ferrari and Tzong-yu Paul Lee; (6) "Software Improvement 
Program," by Opal R. Stroup; (7), "Software Improvement Program (SIP): 
A Treatment for Software Senility," by Carol A. Houtz; (8) "Software 
Improvement through Automated Normalization," by Michael G. Walker; 
(9) "Algebraic Models for CPU (Central Processing Unit) Sizing," by 
Robert A. Orchard; (10) "Establishing a Software Engineering 
Technology (SET)," by L. Arnold Johnson and William R. Milligan; (11) 
"Characteristics of Software Development Team Structures and Their 
Impact on Software Development," by Anneliese von Mayrhauser; (12) 
"Information Centers: The User's Answer to the Computer Room," by 
Esther P. Georgatos; (13) "An Organization Model and Case Study for 
Microcomputer CPE (Computer Performance Evaluation) , " by Malcolm 

■* • 

o 

ERIC 



ED 243 475 

AUTHOR 
TITLE 

/ 



INSTITUTION 

REPORT NO 
PUB DATE 
NOTE 
AVAILABLE FROM 

PUB TYPE 



EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



U.S. Department 
of Commerce 

National Bureau 
of Standards 



Computer Science 
and Technology * 



>■>*., 3 



CD 



NBS Special Publication 500-104 

* 

Proceedings of the 
Computer Performance 
Evaluation Users Group 
19th Meeting 




"CPE - A NEW PERSPECTIVE: 
The impact of the technology 
revolution." 



U.S. DEPARTMENT OF EOUCATION 

NATIONAL INSTITUTE OF EDUCATION 
EOUCATIONAL RESOURCES INFORMATION 
v « CENTER (ERIC! 

NJ^This document has been reoroduced as 

* received from the person or organization 
originating it. 

! : Minor changes have been made to improve 
reproduction quality 

• Points ol view or opinions stated h this docu 
ment do not necessarily represent official NIE 
position or policy. 



ERJO 



NATIONAL BUREAU OF STANDARDS 



The National Bureau of Standards' was established by an act ol Congress on March 3, 1901. 
The Bureau's overall goal is to strengthen and advance the Nation's science and technology 
and facilitate their effective application for public benefit. To this end, the Bureau conducts 
research and previtfes: (I) a basis for the Nation's physical measurement system, (2) scientific 
and technological services for industry and government, (3) a technical basis lor equity in 
trade, and (4) technical services to promote public safety. The Bureau's technical work is per- 
formed by the National Measurement Laboratory, the National Engineering Laboratory, and 
the Institute for Computer Sciences and Technology. 

THE NATIONAL MEASUREMENT LABORATORY provides the national system ol 
physical and chemical and materials measurement; coordinates the system with measurement 
systems of other nations and furnishes essential services leading to accurate and uniform ' 
physical and chemical measurement throughout the'' Nation's scientific community, industry, 
and commerce; conducts materials research leading to improved methods ol measurement! 
standards, and data on the properties of materials needed'by industry, commerce, educational 
institutions, and Government; provides advisory and research services to other Government 
agencies; develops, produces, and distributes Standard Reference Materials; and provides 
calibration services. The Laboratory consists of the following centers: 

Absolute Physical Quantities 2 — Radiation Research— Chemical Physics — 
Analytical Chemistry — Materials Science 

THE NATIONAL ENGINEERING LABORATORY provides technology and technical ser- 
vices to the public and private sectors to address national needs and to solve national 
problems; conducts research in engineering and applied science in support ol these efforts; 
builds and maintains competence in the necessary disciplines required to carry out this 
research and technical service; develops engineering data and measurement capabilities; 
proyides engineering measurement traceability services; develops test methods and proposes 
engineering standards and code changes; develops and proposes new engineering practices; 
and develops and improves mechanisms to transfer results of its research to the ultimate user. 
The Laboratory consists of the following centers: 

Applied Mathematics — Electronics and Electrical Engineering 2 — Manufacturing 
Engineering — Building Technology — Fire Research — Chemical Engineering 2 

THE INSTITUTE FOR COMPUTER SCIENCES AND TECHNOLOGY conducts 
research and provides scientific and technical services to aid Federal agencies in the selection, 
acquisition, application, and use of computer technology to improve effectiveness and 
economy in Government operations in accordance with Public Law 89-306 (40 U.S.C. 759), 
relevant Executive Orders, and other directives; carries out this mission by managing the 
Federal Information Processing Standards Program, developing Federal ADP standards 
guidelines, and managing. Federal participation in ADP voluntary standardization activities; 
provides scientific and technological advisory services and assistance to Federal agencies; and 
provides the technical foundation for computer-related policies ol the Federal Government. 
The Institute consists of the following centers: , 

Programming Science and Technology — Computer Systems Engineering. 

'Headquarters and Laboratories at Gailhcrsburg, MD, unless otherwise noted; 
mailing address Washington, DC 20234. 

'Some divisions within the center arc located at Boulder. CO 80303. 



3 



Computer Science 
and Technology 



NBS Special Publication. 500-1 04 

Proceedings of the 
Computer Performance 
Evaluation Users Group (CPEUG) 
19th Meeting 

San Francisco, California 
October 25 - 28, \ 983 



Proceedings Editor 

Deborah Mobray 

Conference Host 

Navy Regional Data Automation Center 
Department of the Navy 

Sponsored by 

Institute for Computer Sciences and Technology 
National Bureau of Standards , 
Washington, DC 20234 ' . ; # 



.♦* \.:J \ 



\ 




U.S. DEPARTMENT OF COMMERCE 



Malcolm Baldrlge, Secretary 



National Bureau of Standards 

Ernest Ambler, Director 



Issued October 1983 



4 



Reports on Computer Science and 'f*v-tt»*«logy 

The National Bureau of Standards has a special response -ty vvittyn the Federal 
Government for computer science and technology activities. ; ? v i> programs of the 
NBS Institute for Computer Sciences and Technology are designed to provide ADP 
standards, guidelines, and technical advisory services to improve the effectiveness 
of computer utilization in the Federal sector, and to perform appropriate research 
and. development efforts as foundation for such activities and programs. This 
publication series will report these NBS efforts to the Federa J, -*;puter community as 
well as to interested specialists in the academic and private i^Clurs. Those wishing 
to receive notices of publications in this series should complete, and return the form 
at the end of this publication. 



Library of Congress Catalog Card Number: 82-600594 



National Bureau of Standards Special Publication 500-1 04 
Natl. Bur. Stand. (U.S.), Spec. Publ. 500-104, 236 pages (Oct. 1983) 

CODEN: XNBSAV 



For sale by the Superintendent of Documents, U.S. Government Printing Office, Washington, D.C. 20402 

Price $6.50 
(Add 25 percent for other than U.S. mailing) 




U.S. GOVERNMENT PRINTING OFFICE 
WASHINGTON: 1983 



5 



FOREWORD 



The data processing environment has undergone profound changes since CPEUG was 
founded, in 1971. Microcomputers, at that time existed only in specialized 
process control applications. Now end-user microcomputers are commonplace and 
their users have expectations unheard of twelve years ago. In parallel has been 
the increasing perception of traditional large, mainframe data processing as 
only one aspect of a broader view of information resources and information 
technology. With these changes, the challenge of providing efficient and 
effective user support has also grown. 

It is with these changes in mind that this year's CPEUG conference has chosen as 
'a theme "CPE-A New Perspective: The impact of the technology revolution." This 
year's conference offers topics ranging from microcomputers to supercomputers. 
The increasingly complex area of data communications is presented as well as 
topics in office automation, software improvement and engineering, capacity 
planning, and quality assurance, to mentiorh/just a few. The diversity of topics 
reflects the broad range of. areas which CPE analysts must now consider. 

The challenges inherent in the increasingly complex areas of information 
technology also provide new opportunities to increase the effectiveness of the 
services we provide. Even as the technology has grown so has the volume and 
type of information which the users wish to store, manipulate, and retrieve. We 
must be knowledgeable in many new areas to ensure that we are using not only the 
most efficient means available but also the must effective. 

This year's CPEUG topics were specifically chosen to reflect the breadth of the 
CPE field. I believe you will enjoy them as well as learn from them. 



JOHN CARON 

CPEUG 83 Conference Chairperson 



PREFACE 



The theme of CPEUG 83 , "CPE - A NEW • PERSPECTIVE: The impact of the 
technology revolution/ 1 focuses on the rapid introduction of sophisticated 
end-user technology, and addresses the impact of this revolution on CPE and the 
CPE professional. The debate over the question raised several years ago by a 
former CPEUG Program Chairman: "How will CPE, traditionally associated with 
large central computers, change in an era of smaller, cheaper hardware and 
improved digital communications?" intensifies. The CPE professional is 
challenged 'to react to the new technology and trends. The keynote address, 
"Microcomputers: The Risks and Rewards" and the keynote panel "Information 
System Cost Performance ~ New Directions" highlights and sets the framework for 
this challenge and debate. The - conference focuses on the integration of user 
microcomputer systems into the overall ADP structure and concentrates on micros 
and end user computing activities — "Strengthening the End-User Interface", 
"Managing End-User Computing" , for example. The growing relative importance of 
software is recognized and software related issues are addressed in several 
sessions. 

As in the previous Conferences, "tutorials, case studies, panels, and 
technical sessions are included in the program. There are several sessions that 
address the information needs of the first-time attendees, experienced CPE 
analysts, managers, and interested data processing professionals. The CPEUG 
Conference is one of the indispensable events on the calendar of computer 
performance professionals. CPEUG 83 provides a forum for shaping subsequent 
Conferences throughout the 1980' s. 

The CPEUG 83 program was the work of many people. Paul Roth* Vice 
Chairperson for academia; Jim Sprung Vice Chairperson for industry; Arnold 
Johnson, Vice Chairperson for Government; and Dr Deborah Mobray, Proceedings 
Editor, were this year's vital links to broaden program participation. The 
Conference Committee, session chairpersons, authors, and tutors all deserve 
recognition for their time, patience, and participation. The invaluable support 
of Sylvia Mabie merits special thanks. 



CHARLES A. SELF 
CPEUG 83 Chairperson 
October 1983 
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These Proceedings record the papers that were presented at the Nineteenth 
Meeting of the Computer Performance Evaluation Users Group (CPEUG 83) held 
October 25-28, 1983 in San Francisco, CA. CPEUG 83 recognized the rapid 
introduction of sophisticated end-user technology into the information processing 
environment and addressed the challenges posed to the CPE community. CPEUG 83 
offered topics ranging from, microcomputers to supercomputers. The increasingly 
complex area of data communications was presented as well as topics in office 
automation, software improvement and engineering, capacity planning, and quality 
assurance. The program was divided into three parallel sessions, and included 
technical papers, case studies, tutorials, and panels. Technical papers are 
presented in the Proceedings in their entirety. 

Key words: acquisition, benchmarking, capacity planning, computer 
performance evaluation, configuration management /quality assurance , cost 
accounting and chargeback, data communications, end-user computing, local area 
networks, microcomputers, modeling techniques, office automation,- performance 
management, software engineering, and software improvement. 



The material contained herein is the viewpoint of the authors of specific 
papers. Publication of their papers in " this volume does not necessarily 
constitute an endorsement by the Computer Performance Evaluation Users Group 
(CPEUG) or the National Bureau of Standards. The material has been published in 
an effort to disseminate information and to promote /the state-of-the-art of 
computer performance measurement, simulation, and evaluation. 
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f Gary Fisher 
Federal Conversion Support Center 
Washington, DC 
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FRIDAY, OCTOBER 28 



9:00 AM SESSION: Performance Management 
Chairperson: Donald R. Deese 

Federal Computer > Performance Evaluation and Simulation 
Center (FEDSIM) 
Washington, DC 

The Terminal Probe Method Revisited.. Some Statistical 

Considerations 

Luis Felio Cabrera 

Catholic University of Chile 

Santiago, Chile 

Some Elements of Software Functions and Cost Analysis 
as Related to Performance 
John E. Gaffney, Jr 

International Business ^Machines m 
Gaithersburg, MD 

Modeling and Measuring to Improve Network Cost 

Performance 

Ronald K. Leighton 

GTE Services Corporation 

Tampa, FL 

11:00 AM SESSION: Cost Accounting and Chargeback 
Chairperson: Dean Halstead 

Federal Computer Performance Evaluation and Simulation 
Center (FEDSIM) 
Washington, DC 

Standard Costing for ADP Services 
David R./ Vincent 

Institute fox Software Engineering 
Sunnyvale , CA 

/ ' 

Cost and Revenue System at Parklawn Data Center 
Tim Carrico 
Parklawn Data Center 
Health and .Human Services 
Rockville, MD 

Developing a DP Charging System for the FAA 
Harvey Kaplan 

Federal Aviation Administration 
Washington, DC 
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TRACK C PROGRAM 



WEDNESDAY, OCTOBER 26 

9:00 AM TUTORIAL: Requirements Analysis and Workload 

Characterization 



10:00 AM TUTORIAL: 



Carl Palmer 

General Accounting Office 
Washington, DC 

Use of Benchmarking in the Federal ADP 
Procurement Process 



Dennis Shaw 

General Accounting Office 
Washington, DC 



11:00 AM TUTORIAL: Software Development, Guidelines 



Anneliesse von Mayrhauser 
Illinois Institute of Technology 
Chicago, IL 



3:00 PM TUTORIAL: Microcomputer Software 



John Caron 

Office of Software Development and 
/Information Technology 
General Services Administration 
\ Washington, DC 



4:00 PM TUTORIAL: Usability 



David F. Stevens 
Lawrence Berkeley Laboratory 
University of California 
Berkeley, CA 



THURSDAY, OCTOBER 27 
9:00 AM TUTORIAL: IRM Planning 

Nancy Doane [ 

Federal IRM Panning Support Program 
Washington, DC . 

10:00 AM TUTORIAL: Computer Service Selection 

Anneliese von Mayrhauser 
Dennis Erwin Witte 
Illinois Institute of Technology 
Chicago > IL 
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11:00 AM TUTORIAL: Security Certification of Applications 

Software 

Fred Tompkins 
MITRE 

McLean, ,VA 



3:00 PM TUTORIAL: ACMS: A Computer Modeling System 

Duane Ball 

Federal Computer Performance Evaluation 
and Simulation Center (FEDSIM) 
Washington*, DC 

4:00 PM TUTORIAL: Software Instrumentation Points 

James P. Bouhana 

Wang Institute of Graduate Studies 
Tyngsbbro, MA 

FRIDAY, OCTOBER 28 
9:00 AM TUTORIAL: Implementing a DP Chargeback System 

' ^ Dean Halstead 

Federal Computer * Performance Evaluation 
and Simulation Center (FEDSIM) 
Washington, DC 
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Theoretical Performance Analysis of Virtual Circuit LAN 
Sliding Window Flow Control 

E.Arthurs 
G.LChesson 
B.W.Stuck 

Bell Laboratories 
Murray Hill, New Jersey 07974 



ABSTRACT 

A transmitter breaks a message up into packets and transmits the 
packets to a receiver over a single virtual circuit within a local area 
network. The receiver has a finite amount of storage capacity for 
buffering messages. A sliding window protocol turns the transmitter 
on and off to insure there is always storage room in the receiver^ for 
packets. Mean throughput rate and delay statistics are studied as a 
function of model parameters. 



I. Introduction 

Our purpose is to study the traffic handling characteristics of a 
policy for pacing the flow of information from a transmitter to a 
receiver over a logical abstraction of a physical circuit (a so called 
virtual circuit) within a local area network. If the receiver has only 
a limited amount of storage, and the transmitter is faster than the 
receiver, messages can be transmitted and blocked or rejected by the 
receiver because no room is available. A protocol is used for 
controlling the flow of information to insure tha no packet is lost 
. due to no room being available, as well as for other reasons. Since 
the transmitter must be turned on and off to insure no messages arc 
lost, what is the penalty in mean throughput rate and message delay 
statistics as a function of receiver buffering and transmitter and 
receiver processing speeds? References on this subject are found 
elsewhere (e.g., Tanenbaum, 1981, pp. 187- 1 96). 

Examples of such mechanisms are stop-start protocols where the 
transmitter stops until the receiver acknowledges receipt of the 
message (e.g., Binary Synchronous Communications (IBM)), Digital 
Equipment's product line for Digital Network Architecture 
(Wecker,1980; Tanenbaum, 1981, pp.172-174), IBM's product line 
for Systems Network Architecture framework(Green, 1979; Atkins, 
1980), the Defense Advance Research Project Agency Transport 
Control Protocol (Tanenbaum, 1981, pp.373-377) or CCITTs X.25 
(Tanenbaum, 1981, pp.167-172). This mechanism would be used to 
transfer files from one computer to another, for example, over a local 
area network. r 
In our opinion, at the present time a great deal of guidance is 
. required to engineer such systems (cf. the current literature: Bux, 
Kuemmerle, Truong, 1980; Easton, 1980; Fayolle, Gelenbc, Pujolle, 
1978; Kleinrock, 1978A, 1978B; Reiser, 1979; Sunshine, 1976, 1977; 
Traynham, Steen, 1977; Yu, Majithia, 1979; Luderer, Che, 
Marshall, 1982; Luderer, Che, Haggerty, Kirslis, Marshall, 1981). 

This work analyzes a model for a class of of protocols, sd called 
sliding window* protocols, for controlling the flow of information 
between a transmitter and receiver. Granted certain assumptions 
that are felt to be reasonable for local area network applications, it is 



The term window arises from picturing a window onto the stream of packets at the 
transmitter, with the window open onlo packets that have been transmitted but not 
yet acknowledged. The window slides as packets arc acknowledged along with the 
transmitted packets. 



possible to engineer a virtual circuit to achieve predictable 
performance. The protocol is described in detail elsewhere (Knuth, 
1981; Tanenbaum, 1981, pp.148-164); here, in the interest of 
brevity, we will describe only the aspects pertinent to traffic handling 
characteristics. It is representative of a great many protocols 
currently in use, each of which differs in detail in terms of error 
handling, but not in terms of pacing the flow of data between the 
transmitter and receiver (cf Schwartz, 1982, for an analysis of a 
protocol with the same control of flow of data but a different 
acknowledgement strategy). 

We ignore a wide variety of phenomena that must be dealt with in 
an actual system that in our opinion are sufficiently rare to have 
negligible impact on traffic handling characteristics. In particular, 
we ignore the impact of the local area network corrupting or losing 
packets, the impact of using the local area network and the 
interfaces for both control (virtual circuit set up and take down, 
packet acknowledgement) and data transfer, the impact of local area 
network delay being dependent upon the workload, and a variety of 
hardware and software failures. 

Lest the reader feel that this problem is straightforward, we quote 
one authority: 

* m ...flow control procedures are rather difficult to invent and 
extremely difficult to analyze. .Jo date there is no satisfactory 
theory or procedure for designing flow control procedures, much 
less evaluating their performance...* L.Kleinrock, 1978 A 

2. Summary 

We determine quantitative tradeoffs between the time the 
transmitter and receiver spend on packet processing, local area 
actwork propagation delays, receiver storage space for buffering 
packets, and flow control parameters (in particular the maximum 
number of unacknowledged packets at the transmitter, denoted by W 
for window) to achieve different levels of performance (e.g., mean 
throughput rate and delay statistics). 

A mathematical model or abstraction of an actual communication 
system is developed. First, we summarize the /performance of this* 
model using mean values for transmitter ajjra receiver packet « 
processing times before analyzing the impact of fluctuations on 
performance. In our opinion, the mean value analysis is probably 
the most important level of measurement and analysis in practice. 

A system can. be engineered to have a low message throughput rate 
and small message delays, or it can have a high message throughput 
rate and large message delays. The problem is to find the point at^ 
which message delay becomes unacceptable. We do so in two steps: 
first we find the maximum mean throughput rate, which presumably 
will lead to large delays; then we back off from the maximum mean 
throughput rate to a lower rate which will lead to smaller 
(acceptable) delays. There arc three potential bottlenecks in this 
communication system, the transmitter, the receiver, and the local 
area network. In many applications, it is often desirable for the 
transmitter and receiver lo be bottlenecks, not the local area 
network: put differently, we demand Jhe-different devices connected 
to the local area network be incapable of generating enough 
messages to generate unacceptable delays in message transmission 
through the channel. 
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I rom discussions with a variety of knowledgeable engineers, 
:urrently available products achieve a packet delay in the local area 
letwork of one hundred to five hundred microseconds, while packet 
Tansmitter and receiver processing delays of one to five milliseconds 
ire typical. For this case, where network Relays between the 
Tansmitter and receiver are negligible compared to packet 
processing times, when the transmitter and receiver are mismatched 
n speed by a ratio of two or more, i.e., either the transmitter is 
much faster or slower than the receiver, one or the other is a 
bottleneck, and the mean value analysis appears adequate for sizing 
mean throughput rate and mean delay. It is only in the intermediate 
region of comparable transmitter and receiver speeds that there is 
iignilicant interaction between the choice of flow controf'parameters 
md message arrival and transmission time statistics. 

Upper and lower bounds on mean throughput rate are determined 
using only the mean times to handle each stage of packet processing 
md the receiver bulTer size; determining the lower bound was 
previously an unsolved problem. Moreover these bounds are sharp, 
in the sense that they can be achieved if the fluctuations about the 
means become sufficiently small or sufficiently large. Put differently, 
$iven only mean value information, these bounds quantify the best 
possible mean throughput rate. The method of analysis is based 
upon a systematic application of Little's Law (Little, 1961). The 
Analysis leading to the upper bound is well known; the lower bound 
is evidently novel. The lower bound is positive and hence we have 
ihown that at this level of analysis this protocol cannot deadlock. 
This occurs when the transmitter and receiver both have packets and 
acknowledgements to send but cannot because receiver buffer space 
is not available for the transmitter and vice versa. 

Three cases are examined in detail: buffering one, two, and an 
infinite number of message packets at the receiver. Buffering one 
packet allows for no concurrency or parallel operation of the 
transmitter and receiver; buffering more than one packet allows for 
some concurrency and also smooths out bursts of packets, storing 
ihem until they can be processed. Performance (mean throughput 
rate and packet delay), can be significantly increased (provided the 
transmitter and receiver packet processing times are comparable to 
jne another) in going from one to two packet buffering at the 
receiver, and is only marginally increased in going from two to 
infinite packet buffering. This is clear on physical grounds: there is 
snly one transmitter and one receiver, and the best possible 
:oncurrency is achieved when both are busy, i.e., when the 
transmitter is filling one buffer and the receiver is emptying another. 
\ single packet buffer at the receiver is a bottleneck limiting the 
maximum mean throughput rate of packet transmission; moving to 
two. or more buffers at the receiver allows the transmitter or the 
receiver processor to become the bottleneck. Since it might not be 
known in practice which was the bottleneck, this suggests choosing 
two buffers in the receiver. 

To proceed further, we make additional assumptions beyond mean 
values, to test both our modeling assumptions as well as the 
numerical parameters. In particular, we can quantify the impact of 
fluctuations about the mean values on performance. 

The first case is a Jackson network model with associated product 
form distribution for the number of packets at each node in the, 
network. The Jackson network assumptions allow us to obtain an 
exact calculation of mean throughput rate. The analysis uses 
techniques that arc standard for Jackson networks (Kelly, 1976, 
1979). In practice, packet processing times at the transmitter and 
receiver arc widely felt to be much smaller than that found in this 
type of model, so this would lead to pessimistic performance. In 
fact, there is little difference between, the mean throughput rate 
upper bound which is achieved with no fluctuations in packet 
processing times and the Jackson network analysis. In either case, 
choosing two buffers at the receiver achieves virtually all of the 
traffic handling gain. 



The second model deals with the mean throughput rate and delay 
statistics with negligible network delay, for the special cases of 
W-l, W-2, and W-oo, if the transmitter and receiver packet 
processing times are independent identically distributed constant 
random variables, we obtain not simply mean values, but 
distributions., We show that the long term time averaged delay 
statistics for W-2 and W— » arc identical. This was previously 
only conjecture.. 

3. System and Model Description 

The system that motivated this work is described elsewhere (Fraser, 
1979; Chesson, 1979, 1980; Luderer, Che, Haggerty, Kirslis, 
Marshall, 1981; Luderer, Che, Marshall, 1982). 

3.1 Hardware Configuration 

A set of terminals and computer systems are interconnected via a 
local area network switching system. 

3.2 Functional Operation 

A pair of digital systems communicate with otic another as follows: 
After a full duplex virtual circuit is set up over the local area 
network, one system transmits a message to the other; the 
transmitter breaks each message over each virtual circuit up into 
packets, and stores the transmitted packets until the receiver sends 
/ an acknowledgement , that transmitted packets were properly 
received. 

When the system is started, the transmitter starts a packet sequence 
counter, denoted C, at zero. Messages are transmitted in order of 
arrival; packets within messages are transmitted in order over the 
virtual circuit. Each packet has a sequence number that is used to 
pace the flow of data from the source to the sink. Each time the , 
transmitter sends a packet, C is incremented by one; each time the 
transmitter receives' an acknowledgement, C is decremented by one 
and flushes this packet from its transmitter buffer. Hence, the 
packet sequence counter slides from the first packet to the last, with 
packets from different messages possibly interleaved. The maximum 
number of packets that can be buffered by the receiver is called the 
window denoted W. The largest the transmitter packet sequence 
counter can be is W:-. the transmitter knows the receiver can buffer 
at most this many packets. Each packet holds one receiver buffer; 
when the packet sequence counter strikes W the transmitter ceases 
to send messages, until a minimum number of acknowledgements are 
received. A start/stop protocol would have a window of size one 
(W—l): the transmitter would send the first packet of a message 
and wait for a positive acknowledgement before sending the next 
packet, and so forth. A double buffering protocol would have a 
window size of two (W— 2). 

How frequently should the receiver acknowledge packets? Ideally it 
should be done after each packet; however, if this requires an 
unacceptable amount of processor time at the transmitter or at the 
receiver or both, then acknowledgements could be batched. The 
normal operating regime is where control packets are much less - 
frequent than data packets; hence, to first order, we might focus 
attention on the data packet mean processing time alone. 

3.3 Queueing Network Model 

The queueing model of this system (Figure 1) follows the above 
description quite closely. 

3.3.1 Queues and Servers The system consists of a staging queue 
(with no server), a transmitter queue (with one server), a transmitter 
to receiver queue (with Wj servers), a receiver queue (with one 

server), and a receiver to transmitter queue (with W servers). 

I 

3.3.2 Packet Flow Through Queueing Network A packet migrates 
from one queue to another: packets arrive at an external' queue 
where they are staged, before migrating to the transmitter queue, 
then through the transmitter to receiver 1 queue, then to the receiver 
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queue, then finally to the receiver to transmitter queue, before 
leaving the system; a packet is in the system if it is in the 
transmitter or receiver queue (waiting or in execution), in the 
propagation queue from the transmitter to the receiver and vice 
versa. 

j J.J Service Required for Each Step of Packet Communications 
Each packet requires some processing time by the transmitter, 
denoted T tra ns, including both packet processing time and data 
transmission time. Each packet propagates from the transmitter to 
receiver, in a mean time denoted T lrans - W . Each packet requires 
receiver processing time, denoted Tree- Each receiver 
acknowledgement packet propagates from the receiver to the 
transmitter in a time interval denoted Tree-tram • The receiver and 
transmitter processing times are assumed to include the time to 
handle acknowledgement processing. 

3.3.4 Flow Control Policy Arriving packets are stored in- the 
staging queue. If there are less than W packets in the system, the 
packet immediately enters the transmitter queue; otherwise, the 
packet waits in the staging queue. 

3.3.5 Phenomena Ignored By Model If a packet is not received by 
the receiver (e.g., because it was lost in transmission, because the 
receiver buffer overflowed, because the sink acknowledgement that 
the packet was received is lost, or for other reasons) the sender will 
not receive an acknowledgement within a given time interval called a 
time out interval (measured from the end of a given packet 
transmission) and the sender will retransmit the packet. If the 
receiver did in fact correctly receive a packet, when a new copy of 
that packet arrives it will be rejected and another acknowledgement 
will be sent. We ignore the impact of time outs, failures of different 
sorts, and noise which can garble a packet. Furthermore, we assume 
the packet delay due to the local area network is not a function of 
the message load, i.e., we assume the local area network simply adds 
delay to packet transmission. In a well designed system, these would 
be rare events, having little impact on performance. 



4, Mean Throughput Rate Mean Value Analysis (cf Reiser, 1979; Fayolle et 
al, 1978) 

The ingredients in the mean value analysis are 

[1] The transmitter processes a packet; this step has a mean 
duration Ttrans and it requires the transmitter processor 

[21 The packet propagates over the link from the transmitter to 
the receiver; this step has a mean duration T trans —rec • 

[3] The receiver processes the packet; this step has a mean 
duration T re c and it requires the receiver processor. 

[4] Acknowledgements of correct receipt of the packet are 
batched up and then propagate from the receiver to the 
transmitter; this step has a mean duration Tree-trans- 

The acknowledgement processing per message at the transmitter and 
receiver is assumed to be included in T trans and T rec respectively. 

In an appendix, we obtain upper and lower bounds on the maximum 
mean throughput rate, assuming there are alwavs sufficiently many 
packets at the staging queue that there arc W packets in the system. 
Here we simply summarize the results: 

(1) 
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The physical interpretation of the upper bound on mean throughput 
rate is as follows 

• If the transmitter is the bottleneck, then 



x _ 1 

• If the receiver is the bottleneck, then 
X - J- 
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Figure l.Queueing Network Block Diagram 



• If the number of buffers is the bottleneck, then 

W 



it 



+ Irec + h 



(2c) 



The physical interpretation of the lower bound is that at most one 
packet at a time is being handled by the system. The practical 
import of the lower bound is that the maximum mean throughput 
rate is always positive, and hence the system cannot deadlock or 
stop transmitting prckets. 

4,1 Negligible Local Area Network Delay 

In a local area network, packet delay due to the network is presumed 
to be negligible compared to packet delay at the transmitter and 
receiver. This special case merits closer examination. From this 

point on in this Section, we assume Ttrans -rec ™ Tree -trans 
• First, we assume that W-l. If we do so, we see 

1 



Supper " min 



rF77 



(3) 



In words, the maximum rate of transmitting packets is the reciprocal 
of the sum of the mean time spent by the transmitter plus the mean 
time spent by the receiver. 

Increasing the number of buffers from one to two, W-l to Wf-2 
always increases the maximum mean throughput rate, and now we 
see 



Supper min 



1 

l trans 



1 

Tree 



(4) 



Furthermore, this increase is maximized for T t rans-Trec* and then 
the upper bound doubles in going from one buffer to more than one 
buffer. Why is this so? By having more than one buffer, both the 
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transmitter and receiver can simultaneously be filling and emptying a 
buffer, allowing greater concurrency or parallelism compared with 
the single buffer case. We also note that allowing more than two 
buffers, e.g., infinite buffers, will not increase the upper bound on the 
maximum mean throughput rate any further; intuitively, we only 
have two serially reusable resources, the ""transmitter and receiver, 
and double buffering keeps them both busy simultaneously. This 
suggests investigating the three cases of single buffering, double 
buffering, and more than double buffering, in the subsequent 
sections. For the lower bound on mean throughput rate, we see that 

Slower "T""~ J_ '/' (5) 



which is identical to the upper bound for W—\. Why is this so? 
There may be significant fluctuation about the mean values shown 
above, and in the limit of one big swing about the mean value all of 
the messages will pile up at one stage in the network and nothing 
will be transmitted until buffers become available. 

4.2 Nohnegligible Local Area Network Delay 

For the case where the local area delay is not negligible compared to 
the transmitter and receiver processing time per packet, the upper 
bound on mean throughput rate will increase as a linear function of 
the amount of buffering available at the receiver, until either the 
transmitter or the receiver becomes a limiting bottleneck. When 
does this in fact occur in our model? 

Figure 2 plots these upper and lower bounds, as well as the results of 
an Jackson queueing network analysis (e.g., Kelly, 1976, 1979), for 
the special case where 



- T n 



Trec-u 



(6) 



In Figure 2A we have chosen the typical case, negligible network 
delay versus transmitter and receiver processing time, while in 2B all 
these times are equal to one another, and in Figure 2C the network 
delay is ten times the transmitter and receiver processing times. 
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Figure 2A.Maximum Mean Throughput Rate vs W 

The fraction of time the queueing network model predicts the system 
to be in state J is denoted by r(J) where 



trans -rt€ Jmm "~ f J- Tree- trans J ~^~ (7) 
' trans -rtc' rtC J rtc-trans- 



The system partition function denoted G is chosen to normalize the 
probability distribution:' 

(8) 
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Figure 2B.Maximum Mean Throughput Rate vs W 




Figure 2C.Maximum Mean Throughput Rate vs W 

\ . ' 

Two regimes are evident, one where the buffers are the bottleneck 
and the mean throughput rate grows linearly in the number of 
buffers,* and one where the receiver is the bottleneck and the mean 
throughput rate is independent of the number of buffers. For Figure 
2A this occurs at W—2\ for Figure 2B this occurs at W-4; for 
Figure 2C this occurs at W—22. As we see, there is little need for 
large buffers at the receiver in a local area network, at this level of 
analysis, provided the network delay is negligible compared to' the 
packet processing delay. As is evident from the figures, the Jackson 
( network analysis tracks quite closely the mean value upper bound on 
throughput rate.. Since the Jackson network j analysis assumes the 
packet processing times have significantly ^greater fluctuation about 
their mean than in actual systems constant processing time per 
packet, and since the agreement (at this level of analysis) between 
the mean value upper bound and the Jacksonj network is quite close, 
this suggests using the mean value upper bound as a guide to setting 
flow control parameters, because it is quite straightforward to 
analyze. 
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5. Delay Statistics Analysis for W-l and W -2 with Negligible Network 
Delay 

For the special case of W—\ and W-l with - r ,r-7V«--/ra«"0. 
we wish to calculate the moment generating function of the packet ( 
delay random variable T delay * which is measured from the instant a ^— 
packet arrives until it is completely processed by the transmitter and 
receiver. We assume that packets arrive at the transmitter 
according to simple Poisson statistics, i.e., the packet interarrival 
times are independent identically distributed exponential random 
variables with mean interarrival time 1A. The packet processing 
times at the transmitter and receiver are assumed to be independent 
identically distributed random variables denoted by T tran s and T re e\ 
these have associated moment generating functions 

Elexp (-zTtrans)) - Gtrans (*> W> 

Elexp{-zT w )] - GreAz) (10) 
5.1 Delay Statistics for W - 1 with Negligible Network Delay 

For W- 1 the system acts as a single serially reusable resource whose 
service time is the sum of the transmitter and receiver packet 
processing times. The maximum mean throughput rate is the 
reciprocal of the sum 6f the mean packet processing times at the 
transmitter and receiver: > ' 
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Figure 4.Queueing Network Block Diagram for W-l 

The first stage acts as a modified M/G/l queucing system, with the 
modification being that the initial service time of a busy period has a 
different distribution from the other service times during a busy 
period. The random variable Tm denotes the initial service time at 
the first stage during a busy period, and is given by 

T init - maxlTtrans.Trtc - T A ] - T trans +max[0Jrec - rj<14) 
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' trans 1 rec J 

The moment generating function for packet delay is given by 

EleXp (-zTdelay)] " 

r ( I "~ \E {Ttrans + Tree)) _ fl i 7 \ Q (z) 
Z-\l\-<J t rans<zHJ„ c {z)\ Gtr0nsKZ) GwKZ) 

Packet delay is denoted by the random variable T delay* measured 
from the time a packet arrives until it leaves, and its mean is given 
by 

+ E(T tr ans) + £(r w Xl3) 
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5.2* Delay Statistics for W-2 witb Negligible Network Delay 

Figure 3 shows an illustrative arrival pattern and completion pattern 
of packets for the W-l case. This suggests analyzing the delay 
statistics for this case in two stages: 
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where T A is the arrival time of packet that ends the idle period and 
makes the transmitter busy. Note that for the case where 
T t rans>Tr,r. e.g., when both times arc constant, this initial interval 
involves transmitter delay, and all the delay will occur at the 
transmitter and none at the receiver. The random variable 
denotes the service times of all but the first packet during a busy 
period: u 

TnM ™ mQX I Ttrans Tree * 

The moment generating functions for these random variables are 
denoted by 

Elexp (-zTintt)) - Gintt(z) (16) 

Elexpi-zTrr^n-G^z) (17) 

The distribution for the interarrival time is given by 

PROBITa <X] - 1 - exp(-\X) X - \/ElT A ) (18) 

If the packet delay distribution is nontrivial, the maximum mean 
throughput rate must be less than the mean of T m ^ x : 

XElTnn) - \Elmax(Ttrons.Trte)) < 1 09) 

To determine the delay statistics, we look at completion times of 
packets finishing the first stage of processing. At this epochs we can 
associate an imbedded Markov chain, where /V*,K>0 denotes the 
number of packets in system at the kth completion epoch. From 
earlier definitions, we can write 
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The invariant measure associated with the imbedded Markov chain 
for Nk.K>0 is denoted by x*, which has moment generating 
function fCV): 

(22) 
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Figure 3.111ustrative Operation for W-l 

[I] The first stage is the time from packet arrival at the 

transmitter to the start of service at the receiver 
[2] The second stage is the time from the start of service at the 
receiver to the completion of service 
Figure 4 shows a queucing network block diagram of this modified 
system. 



We now substitute into this expression using previously defined 
functions: 

tVC) ■- xoG Mt M\ - X)) + gXK-^KG^Ml - X)) ' (23) 
Rearranging this, we find 

(24) 



W - * 0 5 6 'lV ) - A 



In order for ftV) to be a proper moment generating function, we 
require f(l)-l. and hence 
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Now we have an expression for the moment generating function for 
the number in system at completion epochs to the first stage of 
packet processing. We want to relate this to packet delay. Because 
the arrival statistics were assumed to be Poisson, the moment 
generating function for the time spent in the first stage equals the 
moment generating function f(K) evaluated at K-(X - K)/A, which 
is a deep generalization of Little's Law (Conway, Maxwell, Miller, 
1967, EO.20. pp.156). Hence, the total packet delay moment 
generating function is now known: 



Elexp(-zTdHay)] -f 



\_-\L'lT„ a *] 



X - z 
—J— 



Greek ) 



(25) examine two special ' cases for -oo assuming 
Ttrans-ree -Tree -tram ~0. The packet arrival statistics are Poisson: the 
interarrival times are independent identically distributed exponential 
random variables with mean interarrival time 1/X. The transmitter 
and receiver packet processing times are independent identically 
distributed random variables, with mean transmitter and receiver 
packet processing times denoted by T traia and T rec respectively. 

6.1 Exponentially Distributed Packet Processing Times 

The first case is where the transmitter and receiver packet processing 
times are mutually independent exponential distributions. This is a 
Jackson network (Kelly, 1976, 1979) and we merely cite the results 
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For the special case where the transmitter and receiver packet 
service times are constant, the delay distribution for W-2 is 
identical to that for W — »oo 

The mean packet delay is given by 

ElTdelay] - 



Ell 



(27) 
) 



The first ternris due to the start of busy period and includes the 
transmitter packet processing time, the second term is due to the 
waiting time in the buffer, and the third term is the receiver packet 
processing time. For the special case where the transmitter and 
receiver packet processing times are constant, we have plotted 
illustrative results in Figure 5; the mean delay for W-2 is identical 
to that for oo, and for normal operating regions is significantly 
smaller than for W—\. 
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Figure 5. Mean Packet Delay vs Arrival Rate 

expression for «>, which shows how close the W—2 delay 
statistics can in fact be. 

6. Delay Statistics Analysis with Negligible Network Packet Delay for 

Our goal is to calculate the moment generating function of the 
packet delay random variable Tttiay* measured from the instant a 
-Wu arrives at the transmitter until it departs the receiver. We 
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The latter form is given to suggest that the mean packet processing 
times at the transmitter and receiver are augmented by a form 
suggestive of the mean waiting time for an M/M/l queueing system, 
for the transmitter and the receiver. 

6.2 Deterministic or Constant Packet Processing Times 

The second case is where the transmitter and receiver packet 
processing times are deterministic or constant. Packets are assumed 
to arrive at time instants denoted by r-0,/ii,/i, + A 2 ,... so that A k 
is the interarrival time between the (k-l)th and kth packet. The 
time spent by the kth packet waiting to begin service at the 
transmitter and at the receiver is denoted by T lran s % w, and T w ,w^ 
respectively. 

For W-oo, the following recursions specify the waiting time 
sequences for packets at the transmitter and receiver: 

Tirans.tV t . t - mOX (0,T lr ans.W t + T tr ans ~ A k + ] ) (29) 
Tree.W^ t - ttlOX (0,T w ,tV t + Tree " (C* + , - Q)) (30) 

where C k is the completion time of the kth packet by the 
transmitter: 



C k - A i + 



+ A k + T, 



trans, W % 



+ Ttr 



(31) 



Our goal is to show that the total time spent by a packet waiting at 
the transmitter and the receiver, denoted by 7V , is given by 



7V 4 „ « max [0.7V, + T m . x - A k + t ] 



(32) 



where T^- max lT { ram, Tree] is the larger of the two packet 
processing times. 

Two cases arise. First, if the transmitter is the slower of the pair, 
i.e., 

Ttrans - T mMx - MaxlTtrans.Tree] (33) 

then T Wt w-0 for all values of k y i.e.,; there is no waiting at the 
receiver at all. The closest spacing in time of packets departing from 
the transmitter is greater than 7V, ft and hence we have shown 

7V ( „ - max[Q % T Wk + T max - A k+l ] (34) 

The second case, T tMU u <7 , mMt can be handled in two steps. First, if 

Ttrans, W k + Ttrans ~ > 0 (35) 

then from the recursions we see 

Q+, - C\ - Ttrans ^ (36) 

and hence 
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Ttrans.W^ + Tree.W^ 
~ Ttrans.W, + T t rans ~ /U+l + max[Q % Trec.W k + Tm»t 
- T t rans.W k + 7Wr.it; + T^ x - A k+l > 0 



(37) 
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or in other words 

7V„ -*Tw t + T m - Ak+\ (38) 
as was desired. The second special case is the converse: if 

T\ro*s.W M " 0 T lr0 ns.W k + T tr ans ~ Ak + \ < 0 (39) 

so that 

Ck + \ - Ck - - Ttrons.W t + TVfliu.H^, (40) 

Substituting this into the earlier recursions, we see 

Ttrans.W^ + 7V«\w 4 ., (41) 

or in other words 

7V,., - mfl*lO,7V 4 + Tnu, - A k + } \ (42) 
and hence we have obtained our desired result. 
Using this recursion, the mean packet delay is given by 

6.3 Interpretation of Results 

For the first case, the transmitter and receiver processing times 
exponentially distributed, the mean packet delay is identical to that 
of two M/M/l queueing systems. The mean packet processing time 
is inflated or multiplied by the reciprocal of the fraction of time each 
stage is idle. 

For the second case, the transmitter and receiver processing times 
deterministic or constant, the mean packet delay looks like that of a 
single M/D/l queueing system, with the slowest stage contributing 
all the waiting time, while all the stages contribute to the packet 
processing time delay. For the case where T tr ans>T F tc this will be 
the case; for the case where r lrfl «<r r «, in fact there will be some 
delay at the transmitter stage, simply due to the bursty nature of the 
Poisson process (i.e., an arrival has a nonzero probability of occuring 
within the transmitter packet processing time of a previous arrival), 
but the system behaves as if all the waiting were at the transmitter. 
Due caution is needed. 
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a partial" summary ef efforts to design, model 
optimize a large, Hyperchanne 1-based , local 



This is 

measure and op l - o- * ^ , L . r *u 

network. I elaborate upon the cyclic interaction of these 
four activities and contrast our successes and failures in 
each with its costs. ' 



KeywoMs: discrete event simulation; Hyperchanne 1 based 
network; model validation; network monitoring; network 
per f or mane e e v a 1 ua Lion . 



* 1. Introduction 

Clark, Pogran and Reed 1 have 
suggested that the issues of local 

-jie.tw.oxk_.design_e^ 

either configuration issues or protocol 
issues. They visualize networks as 
consisting of four basic elements; the 
transmission medium, a control mechanism, 
the interfaces and the protocols. 
Network performance is not only strongly 
dependent upon each of these elements, 
but also upon their mutual interactions. 

As has been pointed out by 
Sunshine 2 , Pouzin and Zimmermann , and 
others, network traffic properties such 
as message sizes, rates and their 
distributions have a great effect on 
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network performance, and further that 
performance jis not only strongly 
dependent on the trajfic but also upon 
the mutual interaction of the traffic, 
the configuration and the protocols. 

The study and characterization of 
these interactions is one of the goals of . 
current local network research at LLNL 
More specifically, the Local Network 
Research Group (LNRG) is currently 
investigating the traffic characteristics 
and network performance in an operational 
High Data Rate Local Network (HDRLN) as 
part of a study being conducted for the 
National Bureau of Standards (NBS). As 
currently envisioned, the NBS sponsored 
investigation will not only rely on 
monitoring and measuring techniques, but 
also make extensive use of a model, a 
comprehensive, discrete event, computer, 
simulation of Network Systems 
Corporation's (NSC) Hyperchannel 

Monitors and models can be and have 
been of critical importance in the 
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design, implementation and tuning of' 
local area networks (LAN) and network 
operating systems (NOS). For example, 
modelling and simulation can verify and 
explore the anticipated performance of 
tentative LAN designs well in advance of 
their actual implementation. Once 
implemented, modelling and monitoring can 
help the designer better understand, and 
tune the implementation enabling him* to 
enhance its performance in some desirable 
way. This is basically an iterative or 
feedback process that is a part of the 
larger, cyclic, design -measure-model 
process depicted in figure 1. Modelling 
gind monitoring* provide the feedback 
without which effective design is not 
possible. This design -measure-model 
process is certainly nothing new and I 
only mention it here to stress its cyclic 
nature and that much of its usefulness 
derives from the fact that -it is cyclic. ' 
Modelling, for example, is a way of 
checking out a design through simulating 




Extrapolate 



Figure 1. The cyclic, design-model-measure process 
( i ), showing various possible feedback paths ( f ). 




its behavior in advance of (or 
concurrently with) its actual 
implementation. Ideally, as flaws are 
detected or enhancements discovered 
during the design's simulation, 
beneficial modifi6ations are fed back 
into the original aesign and its model, 
the whole process being repeated. This 
-is a cyclic process that is hopefully 
speedily and cheeply convergent. Today, 
no serious designer of network protocols 
and hardware -fails to simulate his 
designs. It is probably the only 
practical way of knowing whether complex 
state diagrams are complete and accurate, 
or of selecting one from among a set of 
competing designs. 

-Figure 1 indicates that the points 
of feedback in this cyclic process occur 
at the modelling and measuring locations. 
I/Without modelling and measuring, 
successful and efficient design and 
implementation of things as complex as 
networks becomes difficult if not 
altogether impossible. If you don't 
* (can't) model it, how can you hope to 
understand it, and if you don't measure 
it how do you know if it's working right 
(how can you hope to fix it)? 

So, the NBS project, will proceed by 
a composite of simulation and measurement 
methodologies in the following way: ' we 
will use a monitor to study the 
performance of LLNL's HDRLN, the Craynet 
(a Hyperchannel based LAN); we will also 
use this monitor to validate LLNL's 
discrete event, computer simulation of 
the Hyperchannel; and we will use this 
validated simulation to characterize the 
performance of a HDRLN at extrapolate.d 
loads (loads derived from measured values 
by extrapolation). 

1 shall use the cyclic, 
design-modeirmeasure paradigm as' the 
schema throughout the remainder- of the 
paper for . elaborating upon our 
experiences, our successes and, failures, 
and the costs. 



2. Design 

The Craynet, figures 2, 3 and 4, a 
high data rate local network (HDRLNj, is 
the ' only access the user's have to LLNL's 
complement of three (soon to be four) 
Cray computers. It is . also the only 
access the Crays have to facilities such 
as the long term storage and transfer of' 
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Figure 2. The CRAYNET, a schematic showing connectors, spacing, adapter types and attached hosts. 
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[gure 3. OCTOPUS NETWORK, with the CRAYNET subnet shown in heavy black lines. 



files as well as to the usual output 
devices. 

Figure 2 is a schematic diagram of 
the various Network Systems Corporation 
(NSC) Hyperchannel adapters that are 
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attached (or will soon be attached) to 
the Craynet. It shows their spacing 
along the Hyperchannel coaxial cable, the 
connectors each one uses as well as the 
computer/service to which each one is 
connected. 
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A = Adapter number 
X = Transfer rate Mbps 
P= Adapter priority 
Device = Host/device 
TYPE = Adapter TYPE 

□ = 320-3200 kbits/msg burst mode 
320-3200 kbits/msg multiplex 
mode 

1 kbits/msg, multiplex mode 
25-250 kbits/msg multiplex mode 



0 = 

o = 
• = 
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Figure 4. An older, single CRAY version of CRAYNET showing traffic patterns and message sizes'. 



Octopus, figure 3, LLNL's' computer 
network, has evolved over the last '15 
years as a packet switched, store and 
forward, partially connected mesh - 
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network. The Hyperchannel based Craynet 
is represented in figure 3 by the heavy 
black lines. It is the latest addition 
to the Octopus network. 
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Figure 3 depicts the logical 
interconnect ion of the three Cray 
computers, a plethora of storage devices, 
output devices and terminal 
concentrators. The Craynet, a broadcast 
network; is thought of as an adjunct to 
the Octopus network. It was looked upon . 
as a simple way of achieving many point 
to poiqt p x aths amongst all the various 
host machines*, output/storage devices, 
and terminals concentrators for all newly 
acquired computers. 

Indeed, some of the traffic on the 
Craynet is store and forward traffic from 
the Octopus network — internetwork traffic 
requiring certain machines to act -as 
gateway's. For example, files from **e 
Grays destined for the automated tape 
library (ATL) must pass through the 
multiple access storage system (MASS), 
which acts in this instance as a gateway 
to the older part of the octopus network. 
The actual path of transfer is then, 
Cray=>MASS*7600-S*PDP-10=>ATL. I mention 
this because we at LLNL tend to think of 
and use (and possibly short sightedly so) 
the Hyperchannel as just another link in 
the existing message network, the 
Octopus. 

Table 1 presents the data paths 
within the Craynet indicating the message 
size and modes of transmission per path. 
It is interesting but not surprising to 
observe that many paths are not used. 
Indeed, if one considers only the major 
traffic and assumes that messages never 
need to travel in store and forward mode, 
the Craynet configuration logically 
begins to resemble a star, with the three 
Cray adapters as the central node and the 
concentrator/storage/output adapters as ' 
the peripheral nodes. Figure 4 depicts 
this for an older, single Cray version of 
the Craynet. 

The Crays are large, fast computers 
operated under timesharing. Their l 
principle use is in the numeiical / 
solution of large systems of partial / 
differential equations pertaining to 
physics calculations. During the day, 
most programmers/ physicists are 
developing new versions of their 
calculations or debugging existing ones. 
In addition, a large number of users are 
preparing input to these calculations to 
be performed during nightly production 
runs. Finally, a lot of users use these 
machines to further process and analyze 
results and output from previous nights' 



production runs. During the night a ? nd on 
weekends, the Crays are mainly used in 
production mode. 

There are radical differences 
between the daytime, user generated tasks 
and nighttime, production t orjented"-/tasks 
of the Crays., During the day, the tasks 
are numerous, small and execute for short 
'periods of time. The opposite conditions 
occur at night. Because the timesharing* 
system is tuned to refiect this 
disparity, interactivity is very poor 
during* the nighttime j hours, and large 
programs seldom get jloaded or executed 
during the daytime hours. 



3.| Model 



It is ^unfortunate from the 



standpoint of this cyclic paradigm whose 
virtues I am trying to extol that lae 
Hyperchannel, the Craynet, and the 
Octopus network itself were designed and 
implemented without the aid of any formal 
modelling or computer simulation % 
techniques.. 

Who' knows how much easier it would 
have been to do all of this, or how much 
more efficient these designs could have 
been had modelling been used? . 
Unf6rtunately, history does not record 
its" alternatives, and it's difficult if 
not silly to try lo argue with success. 

As an aside, it's worth noting that 
during the initial design of Ethernet, 
extensive use was made of modelling and 
computer simulation techniques , to. 
eliminate the useless or deleterious . 
features, and optimize the favorable 
ones. By such a process, its designers 
finally arrived at what has come to be 
called Ethernet «r- carrier sense multiple 
access with collision detection (CSMA/CD) • 
with binary backoff and retry. Both the 
original Ethernet and the now 
commercially available product perform 
excellently the tasks for which the^ were " 
designed. J — ; — - 

The Hyperchannel, on' the, other hand, 
which predates, the Ethernet by quite a 
few years as a commercial product, was 
designed and implemented almost entirely: 
without the use of models and simulation. 
With its CSMA media access method and 
link level protocols, it is capable of ' 
operating faster than 8000 messages a 
second (512 bytes per message), i.e, 120 M 
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Tabic 1. Showing message sizes, paths and transmission modes for the CRAYNET. Here, O represents 1.2-250 kbits/msg 
sent in multiplex mode; • represents 0.49 mbits/msg sent in burst mode , and 0 represents 0.49 mbits/msg sent in 
multiplex mode. ' '_ 
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micro-seconds per message. Yet when the 
various hosts in the Craynet, figures 2 t 
3 and 4, are subjected to loads such as 
given in tables 1, 2 and 3, they are 
unable to present to the network a load 
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anywhere near this magnitude, perhaps a 
few hundred messages per second on a 
sustained basis (and during peaks?). 
Well, 'in 1977 t as part of a research 
project at LLNL to evaluate Hyperchannel 
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Table 2. CRAYNET message sizes, frequencies, modes and paths where BM = data stream mode, 
and MM = packet mode. 



Source 




Msg. 


Max. Msg. 




\ 


Adapter # 


Sink 


Size 


Frequency 


Msg. 


\ 


(= priority) 


Adapter # 


(K-bits) 


(msg/sec) 


Mode* 


Description 

\ 


1 (12,22,25) 


4,5 


i 

490 




BM 


CRAY-D to MASS (file xfer) 


1 (12,22,25) 


16 


490 




BM 


CRAY-D to CDC 819 DISKS 


2 (13,23,26) 


10 


490 


<.07(?) 


MM 


CRAY-D to STC tapes 


3 (14,24,27) 


7 


2-250 (?) 




BM 


CRAY-D to SEL 


3 (14,24,27) 


8,9 


.2-. 8 


4 


MM 


CRAY-D to terminals 


3 (14,24,27) 


11,15 


.2-. 8 


(?) 


MM 


Store 4 f wd backup terminal msgs 


3 (14,24,27 


16 


32-262 




MM 


CRAY to TMDS (Text-Pix Display) 


3 (13,23,26) 


19,20 . 


360 


5 


MM 


CRAY to CHORS 


4, 5 


12,14,22,25 


490 




BM 


MASS to CRAYs (file xfer) 


4, 5 


17 


490 




BM 


MASS to 7600 (file xfer store/fwd) 


4, 5 


7,8,9,11,21 


.018-. 8 


(?) 


MM 


Store 4 fwd backup terminal msg 


6 
7 


3,14.24,27 


.018 


<2(?) 


MM 


. • • < 
SEL to CRAYs terminal MS 


7 


8,9,11,15 


.018-. 8 


(?) 


MM 


SEL to TTYs (store 4 fwd), I/O 


7 


21 


(?) 


(?) 


MM 


SEL to TMDS (?) 


8 


3,14,24,27 


.018 


< 2(?) 


MM - 


li-mux to CRAYs input TTY msg 


8 


9,11,15,21 


.018-. 8 


(?) 


MM 


M-mux, store 4 fwd TTY msg, I/O 


9 


3,14,24,27 


.018 


£2(?) 


MM 


PDP-11 to CRAYs input TTY msg 


9 


11,15,21 


.018-. 8 


(?) 


MM 


PDP-11, store 4 fwd TTY msg I/O 


10 


13,2,23,26 


490 


(?) 


. MM 


STC Tape to CRAYs 


11 


3,14,24,27 


.018 


(?) 


MM 


PDP-10 to CRAYs store/fwd TTY inpul 


11 


17 


490 


(?) 


BM.. 


PDP-10 to 7600-S store/fwd xfer 


15 


3,14,21,24,27 


.018-. 8 


(?) 


MM 


VAX store/fwd TTY msg I/O 


16 


1,12,22,25 


490 


■ (?) 


BM 


819' s to CRAYs file xfer 


17 


4,5 


490 


(?) 


BM 


MASS to CRAYs file xfer 


17 


11 ' 


490 


(?) 


BM 


7600s to PDP-10 store/fwd file 


18 












19 










CHORS 


20 










CHORS 


21 


3,14,24,27 


.018-. 8 


(?) 


MM 


TMDS to CRAY msg store/fwd 


16 


4,5,7 


490 


(?) 


BM 


SEL/MASS store/fwd file xfer via 8J 


21 


^,5 




(?) 


MM 


TMDS to MASS 


21 


7,8,9,11,15 


.018-. 8 


(?) 


MM 


TMDS, store/fwd TTY msg I/O 



39 



42 



Key to Tables 1 and 2 



STC . STORAGE TECHNOLOGY CORP. 

CRAY CRAY RESEARCH CORP. CRAY-1 

7600 CONTROL DATA CORP. 7600 

PDP-11 DIGITAL ELECTRONICS CORP. 

VAX DIGITAL ELECTRONICS CORP. 

PDP-10 DIGITAL ELECTRONICS CORP. 

TMDS LCC designed 

819' s CONTROL DATA CORP. 

SEL SYSTEMS ENGINEERING CORP. 

p-MUX LCC designed 

MASS LCC designed 

CHORS LCC designed 



tape controller and 6250 bpi drive 

computer 

computer 

secondary terminal concentrator 

computer 

computer 

television ctisplay monitor system 
network disk(s) 
internet gateway (octopdrt) 
terminal concentrator 9600 baud ■ 
multiple access storage system (CDC 
38/500, T 1-980 

computer hardcopy output recording 
system (18000 1pm printer and pfilm) 



based networks and their potential, a 
discrete event computer simulation of the 
Hyperchannel was developed. The initial 
thrust of this project was to explore the 
performance of such networks, but very 
quickly its goal became the detailed 
examination of Hyperchannel functionality 
(adapter hardware and protocols). Much 
was learned and much, 
reported, 4 * 5 - 6 * 7,8 ' 9 mostly about certain 
shortcomings and inadequacies of these 
protocols, and their impact on 
performance at loads higher than occur in 
the Craynet. 

I bring this up only to indicate the 
feedback that took place subsequently in 
the design of their new product, the 
Hyperbus. NSC did cooperate extensively 
with us during the development of the 
Hyperchannel ' model, and corrected those 
problems brought out by simulation that 
were ecomically feasible to correct. 
However, a few potentially serious 
, shortcomings remained, notably in the 
area oPbuffer management. Today if one 
compares Hyperchannel functionality with 
that of the new Hyperbus, one can't help 



but observe that many of these 
shortcomings were corrected iti its 
design. 

What I am describing here is a 
design-model feedback path that 
transcended not only a succession of 
products, but also the relationship 
between vendor and customer, where all of 
the designing occurred in a private 
corporation and all of its simulation 
occurred in a government laboratory. As 
one could expect, such a feedback path is 
greatly attenuated by the conflict 
between a corporation's need to protect - 
its ideas and the government's insistence 
that the results of all publicly funded 
research end up in the public domain. 
Add to this conflict of needs the legal 
restriction that the U. S. government 
may not fund a project that would result 
in an unfair advantage for some 
corporation(s) over its competitors. 

As another example, 1 have been 
assured on several occasions by people 
within Control Data Corporation (CDC) ' 
that-not only did they benefit from our 
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studies of the Hyperchannel in the design 
of their own network product, the Loosely 
Coupled Network (LCN), they also made 
great use of their own computer models 
and simulations. 

I 

And lastly, the American National 
Standards Institute (ANSI) also 
benefitted from the LLNL a gel CDC 
simulations, that is, the /ANSI ' 
committee, X3T9.5, proposed standard, the 
Local Distributed Data Interface (LDDI) 
that is currently in thfe final stages of 
approval, embodies basically a 
Hyperchannel-like design with, in my 
opinion, all of its shortcomings and - 
inadequacies corrected. The point I am 
trying to make is that in at least three 
instances a design was improved by the 
feedback of information obtained as a 
direct result of the simulation of the 
original design. 

What did all of this cost? The 
posts lie mainly in three categories: 
language costs, model development costs, 
and simulation execution costs. We used 
the ASPOL language (a CDC software 
product for 6600 and 7600 machines only), 
which operates only under CDC operating 
systems. As simulation languages go, it- 
was fairly inexpensive to acquire and to 
master (compared to CACI's S1MSCRIPT), 
but not particularly powerful or 
ubiquitous (compared again to SIMSCRIPT). 
The model required about one half -of a 
man year to program and debug. Finally, 
computer, simulation used about an hour of 
CDC-7600 cpu time per study (depending of 
course on the nature of the study) for 
the type we did. 4 ' 5 ' 6 ' 7 ' 8 ' 9 

Before I leave this section on 
modelling/simulation, I'd like to explain 
how the model was used to help design a 
network monitor (currently being built 
for LLNL by an outside corporation 
entirely with funds provided by NBS). If . 
the reader wishes to avoid reading a lot 
of gory detail about buffer size 
determinations, he may skip ahead to the 
measurement section. 

The Hyperchannel Monitor Device 
(HMD), figure 5, will detect, time-stamp, 
bus-label and write to magnetic tape 
selected portions of all frames that . ... 

appear on each Hyperchannel transmission 
cable (bus) to which it is attached. 
Specifically, the selected portions are 
the frame header and part of the frame 
body, figure 6. 




' 1 — : — 
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8089 
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50 Mbps 



50 Mbps 
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Figure 5. A schematic of the HMD showing data paths, 
(dotted lines), position of critical buffers and FIFOs, 
and maximum, data transfer rates in megabits/second. 



The HMD is totally passive, as far as 
the Craynet is concerned in that it never 
originates any network messages of its 
own, nor is ever the explicit recipient 
of any. It will not impair the normal 
operation of the Craynet in any way. 1JL 
attaches to the Hyperchannel at the 
physical end of the buses replacing the 
terminators that are usually there. 

Since each bus normally operates at 
a rate of 50 megabits/second (Mbps), the 
HMD must be able to copy bits from each 
bus into internal buffers at this same 
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Part 


Field 1 


Bits 




Sync 


24 




Frame code 


: " 8' V" 




Access code 


16 




To 


8 


Frame 


From 


8 


header 


Response 


16 




Length count 


16 




Header checkword 


16 


Data 
field 


Data 

Data checkword 
Sync 


n 

16 
56 



Fig. 6. Hyperchannel frame format. 

rate. The 50 megabits/second rate is the 
instantaneous maximum rate at which bits 
could arrive at the HMD^ However, due to 
the time multiplexed aspect of the 
Hyperchannel bus access \strategy, and the 
fact that it is only the Ifronts of frames 
that are copied, the average, sustained 
maximum data rate is only a\ few 
megabits/second. 

As the HMD collects data, it writes 
it to a magnetic tape. Data reduction 
and data base management routines will 
subsequently analyze the data on these 
tapes on some suitable computer(s) in the 
Liverfnore Computer Center (LCC). 

At the outset, by far the most 
critical design question we had to answer 
was, Could the various internal buffers 
in the HMD keep up with the data rates 
expected to exist on the Craynet? To 
answer this question, we used the 
discrete event simulation model of the 
Hyperchannel. We subjected the modelled 
Hyperchannel, figure 2, to a steady state 
load similar to those discussed in the 
above section and depicted in the plots 
at the end of this memo. As far as we 
could tell, this load is real and 
typical, and amounts to about 5% of the 
Craynets capacity. 

Briefly reviewing the internal 
buffers of the HMD, figure 5, there is a 
FIFO buffer that receives data from the 
Hyperchannel bus, i.e. there is one FIFO 
per bus in the network. Each FIFO is 
capable of receiving data at the bus rate 
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of 50 Mbps. Each FIFO consists of 32 (or 
64, 96, 128) 128-byte entries." Each 
entry can hold the data capturecl from one 
and- : -only-:-one-.-Hyperchanael frame. Not all 
of the bits in a Hyperchannel frame are 
ciesired, only the leading bits containing 
information about the. frames function! 
size and. routing, figured. So, no more 
than 77-bytes of data ar;e ever captured 
from any Hyperchannel frame. There is, 
in addition, a 5-byte time stamp and a 
1-byte bus label that must ultimately be 
appended to these data bytes prior to 
their being written to tape. 
Consequently, the maximum amount of data 
to be written to tape for each frame 
appearing on a Hyperchannel bus is 
83-bytes. To simplify the subsequent 
retrieval of data- from this tape, all 
tape records must be. some multiple of 
this 83-byte logical record. 

To continue, the FIFO empties into 
an intermediate buffer in the 8089 bus 
board in which it resides. Each 8089 
buffer is easily capable of keeping pace 
with its FIFO, (but not with the 
Hyperchannel bus), so this " interface is 
not* likely tc be a bottleneck and 
requires no further attention. 

Each 8089 buffer empties over the 
multibus into the 8086 buffer, moving 
over this path at a maximum rate of 16 
Mbps. Finally, the 8086 buffer empties 
to tape, also over the multibus, at a 
maximum rate of 1.6 Mbps (1600BPI at 
125IPS). 

We will ignore for the time being 
such concerns as whether or not a FIFO 
can simultaneously fill from the bus 'side 
and empty out the 8089 side. Similarly, 
we need to know that the 8089 and 8086 
buffers can empty and fill 
simultaneously, and if not, what effect 
this will have on their data transfer 
rates. 

The first question examined with the 
aid of the simulation model was, Can a 
FIFO keep up with the Hyperchannel bus 
for the presented loads? Two 32X128-byte 
FIFOs can empty at 16 Mbps in about 4 
milliseconds, assuming that all 128 bytes 
of each entry moves to the appropriate 
8089 buffer regardless of whether they 
hold data or not. If this is the case, 
we next ask how many frames will appear 
on the two Hyperchannel buses in any 
given 4 millisecond window? 
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Specifically, is it more than 64? What 
we are dealing with now is not bit rates 
but frame and entry rates. The model 
indicates that for the given load, 
monitored for a 20 second period, 85% of 
the 4 millisecond windows contain fewer 
that 10 frames, but the probability is 
almost , J 00% that at least three 4 
millisecond windows will contain in 
excess of 64 "frames. The implication is 
that the FIFOs will overflow once every 
6.66 seconds for the given load on the 
average. That is, no FIFO of any depth 
can hope to keep up with the Hyperchanne 
if it is emptied as described above. 

On the other hand, if only data bits 
are transferred, we get a different 
result. In this case, we can think of a 
128-byte-wide v entry as merely ah 
addressing .convenience, and that when 
data moves from these entries to the 8089 
buffer, only data, time stamp and 
bus-label bits are taken. When operated 
in this fashion, we can ignore the frame 
rates and concentrate simply on the bit 
rates associated with the data captured 
from the buses. 

For the same load used in the above 
experiment, we observed: that the 
average number of bytes captured from the 
Hyperchannel frames was about 45 
bytes/frame; that the peak data rate for 
both buses combined was 60 Mbps, for a .4 
millisecond period; that the peak capture 
rate for t>oth buses combined was about 8 
Mbps for a 4 millisecond period; 4 and that 
the average data rate for the captured 
data over the 20 seconds of the 
experiment was about 380 kbps. 

Since the FIFOs are emptied at 16 
Mbps, it does not appear that a two 32 
entry FIFOs will ever overflow for the 
loads modelled even when we consider the 
extra load created by the addition of the 
6 bytes of time stamp and bus label. 
That is, each captured, 45-byte record is 
expanded to 51 bytes, thus resulting in 
. an apparent FIFO-to-8089 buffer bandwidth 
of about 45/51 st 's, or 84%, of its actual 
value. 

How often do peaks occur on a 
Hyperchannel bus under a normal load, a 
load that we have tried to model given 
the data plotted in the figures within 
this report? We can not answer this 
question. This is one of the questions 
we hope to answer once we have the HMD. 
Unfortunately, it is one of the things we 



needed to know in order to properly 
design it. However, if we subject the 
simulated network to the steady state 
load described herein, contention peaks 
/occur. A contention peak is one that 
occurs due to the randomizing, effects of 
the contention algorithm upon bus access 
granted to various nodes with various 
size messages to send: , : v 

The model does give us a feeling for 
what the magnitude, frequency and 
duration of these contention peaks is. 
The two buses together peak at a rate of 
60 Mbps for a duration of 4 milliseconds, 
once every 20 seconds. Lesser peaks 
occur more frequently. For example, a 40 
Mbps peak of 4 milliseconds duration 
occurs once every 0.2 seconds. ' I- 

The next questions we answer are, 
What was the average rate at which bits 
were captured during the simulated time? 
and How does this compare to the 
8086-to-tape write rate? 

During thv 20 seconds of simulated 
operation, data v/as captured at a rate of 
380 kbps. Again, the average number of 
captured bits/hyperchannel frame is 45 
bytes. These captured data bits are 
augmented with 6 bytes of time stamp and 
bus label, expanding the 1 number of bits 
destined for tape from 45 to 51 bytes 
worth. However, to maintain simplicity 
in the subsequent retrieval of data from 
this tape, these 45 (plus the 6 above) 
bytes flow onto tape in 83-byte records, 
(or multiples of 83-byte records). We 
recall that simplicity is often expensive 
as we note that imbedding 45 bytes of 
data in 83 tiyte records acts to diminish 
the effective! bandwidth of the tape, i.e. 
a reduction from 1.6 Mbps to .867 Mbps. 
Fortunately for us, 867 kbps is more than 
adequate to keep up with the average 
capture rate jof 380 kbps. 

What we need to know next is the 
size of\the 8086 buffer required to 
handle the peaks likely to occur. Again, 
since we N know nothing about these peaks, 
the best we /can do is explore the effects 
of contention peaks: For the sake of 
i argument, lets postulate an 8086 buffer 

that requires one second to write to 
I tape, realizing that the rate at which 
data is written to tape depends on a lot 
of factors such as multibus availability 
i and the average number of bits 
I captured/Hyperchannel frame. We 
simulated the load described herein, 
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lookup at .5 second windows trying to 
determine? if any of these windows 
contained more captured bits than would 
fit into half of some size 808(5 buffer. 
That is we are simultaneously trying to 
determine the size of the 8086 buffer and 
the effective 8086-to-tape transfer rate. 

We found, that during 205 seconds of 
simulated time, the average captured data 
rate was 412 kbps and that the average 
number of bytes capture/Hyperchannel 
frame was 45. The fact that this latter 
average is so high means that data could 
move to tape at more than half (45/83 's) 
its peak, rate or at .867 Mbps. This 
implies that an 8086 buffer that is as 
large as or larger than .867megabits 
could keep up with the contention peaks 
associated with the steady state load 
simulated. Further examination of 
simulated results shows that of the 410 
.5 second windows in this experiment, 
none contained more than 350 kbits. 

The same experiment was conducted 
assuming an 8086 buffer half as large as 
above, i.e. 512 kbits, capable of being 
written to tape in .5. seconds. Looking 
at .25 second windows, we observed that 
about 1% of them contain more than 256 
kbits, i.e. more than half of the 8086 
buffer. This implies that 1% of the time 
a 512 kbits buffer would fail to buffer 
the input peaks. 

One final observation is that as the 
number of bits per message presented to 
the Hyperchannel decreases, the average 
number of bytes captured per Hyperchannel 
frame decreases. The implication is 
clear; the effective 8086-to-tape 
bandwidth will decrease as a consequence. 
That is, the tape will be less likely to 
keep up for two reasons; data (mainly 
protocol frames) is arriving too quickly, 
and the transfer bandwidth is diminished. 

I think 1 can conclude from all of 
this that for the loads modelled, any 
reasonably large 8086 buffer, e.g. 1024 / 
kbits, should enable the tape to keep up 
with the HyperchanneK assuming that the 
multibus can accommodate a doubly 
buffered 8086 buffer. 

Those were the major design 
questions we had to answer before we 
could have any confidence that a monitor 
could be built, a monitor that would, 
capture the desired data almost 100% of 
the time. It is possibly worth noting 



two things in passing. First, the above 
use of the model is a premier example of 
the cyclic design process depicted in 
figure 1. For 1 have described ho'w the 
network model helped design the network 
monitor, the monitor which itself 
subsequently will help validate the 
model. And second, once there is a 
computer simulation, it is'relatively 
easy, inexpensive, and informative to 
perform a peripheral study such as this 
one. 



4. Measure 

There are at least two ways to 
measure the performance and behavior of a 
broadcast network. One can install 
software network wide in every host 
operating; system and thereby measure and 
monitor every host's network related 
activity. However, he would have to 
design such software uniquely for each 
different host system. Presumably the 
results of such monitoring would somehow 
be collected later at some central point 
for analysis and interpretation. 
Alternatively, one can simply just 
monitor the broadcast network's 
transmission medium. 

Each of these two techniques has its 
advantages and disadvantages. For 
example, using host resident software, it 
is easy to determine network message 
queues and host to host transmission 
delays, and network throughput on a per 
host basis; but it is almost if not in 
fact totally impossible to synchronize 
the initialization of host monitoring 
software network wide so that all the 
measurement periods cover exactly the 
same period of real time. In addition, 
if the network itself is used for the 
continuous- collection of measurement data 
at a central sight, then the network's 
performance will have been effected by 
the measurement process. Finally, if 
network monitoring software was ml 
included in the 'design of each host's 
- operating system at its outset, adding it 
later is usually prohibitively expensive 
if not impossible to accomplish given a 
diversity of hosts such as in the 
Craynet. On the other hand, using the 
second alternative, medium monitoring, 
network message queue length information 
is inaccessible, and transmission delay 
can only be inferred and then only during 
periods of low loads (no message queues). 
However, network throughput information 
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is precisely knowable for arbitrarily 
small or large measurement periods. 
Ideally, one would proceed by using both 
techniques, and indeed we at LLNL are 
attempting to measure the Craynet by a 
composite of both methods. 



4.1 Software monitor 

What follows next is a brief 
description of the kinds of things we've 
been able to measure so far using 
monitoring software residing in some (but 
by no means all) hosts. Following that 
is a commentary upon and evaluation of 
our res/Ults so far. 

le Craynet is currently monitored 
by the/ systems of some of the host 
computers connected to it. The data 
accumulated by these host systems is 
periodically and routinely collected, 
analysed and plotted. Much useful 
information is thereby available 
concerning the use that the various hosts 
make of the network. The Cray operating 
system monitors all of its own traffic 
into and out . of the Craynet. It is not 
practical nor desirable to present all of 
the results of the data thus collected. 
However, I shall present some of the more 
interesting results pertaining to 
peripheral I/O, terminal message , and file 
transfer activity. The plots, figures 7 
to 17, were obtained from just the Cray-C 
machine, the most heavily used of the 
four Cray computers. 



In figure 7 is plotted the number of, 
user-job initializations/minute during a 
24 hour period. The number of 
initializations peaks at a value of 
65/minute just before lunch. 

In figure 8 is plotted the number of 
user-jobs loaded into main memory for 
execution under timesharing per minute 
during a 24 hour period. The number o 
loads peaks at a value of 350/second just 
before lunch. 

in figure 9 is plotted the number of 
output files generated per minute during 
a 24 hour period. The number peaks at a 
value of 4/minute at 7 pm. This peak 
occurs then, because a lot of low 
priority output files, which are 
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Figure 8. The total number of user programs 
initializations per minute. 
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Figure 7. The number of first-time initializations of 
user programs per minute. 
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Figure 9. The total number of user/program generated 
output files per minute. 
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accumulated during the day, are 
off-loaded at night. Incidentally, some 
of rv these files are sent over the 
Cray-Chors path (see table 2 and figure 
4). 

Figures 10, 11 and 12 plot data that 
pertains to disc file creations and data 
transfers to/from the Cray machines 
disks. It is interesting to note, figure ^ 
12, the transfer rate reaches a peak 
value of 50 megabits/second, a peak that 
is maintained for hours at a time. None 
of this disc activity involves the 
Craynet or Hyperchannel. 

The previous plots and discussion 
are intended to give the reader some 
feeling for the kinds and intensity of 
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Figure 12. The total number of bijts/second of disk traffic 
into/out of CRAY-C machine. 
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Figure XO. The total number of files created by 
user/utility programs per minute. 



activities taking place on; the Cray 
machines. It is one of the purposes of 
the NBS study to somehow relate this 
activity and intensity to the traffic 
measured orl the Craynet. However, this 
work can not continue until the 
Hyperchannel monitor device becomes 
available. 

" Figures 13 and 14 depict the 
terminal traffic in and out of the Cray 
C-nachine over , a 24 hour day. These 
figures show that the maximum input rate 
is about 350. messages/minute, (8000 
bytes/minute). These messages arrive at 
the Cray-C machine over a variety of 
paths from the different terminal 
concentrators, (see table- 1). It is not 
possible to distinguish amongst the 
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Figure 11. The total number of disc accesses per second 
caused by uscr/utility/system programs. 
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Figure 13. The total message traffic to/from CRAY 7 < 
machine for all terminal concentrators. 
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Figure 14. The total message traffic, in kilo bytes/ 
minute, into and out of the CRAY-C machine for a!! 
concentrators. 



of data flow over a 24 hoUr day from the 
Cray-C mach ine to the TMDS. 

File transfers, in bits/second, to 
and from the MASS storage device are 
^plotted in figure 16, and reach a peak 
lvalue of 3.5 megabits/second around 3 pm 

maximum network message associated 
with such file transers contains around 
200\ kilobits. 

The computer hardcopy output 
recording system (CHORS) provides the 
users With their principle output 
service. \ It is to the CHORS utility that 
files are' sent when destined to become 
paper, film or fiche. Figure 17 shows 
the data traffic from the Cray-C machine 
to the CHORS utility adapter. This path 
peaks at a value of 210 kilobits/second. 



contributions of these various paths to 
this data. 

The television monitor display 
system (TMDS) provides the users with 
various display facilities at their 
terminals. By far the most common use to 
which the TMDS is put is in the area of 
word processing. The display of textual 
data requires relatively small network 
messages, i.e 32 kilobits/message. The 
TMDS is also used for ; the display of 
graphical or plotted data usually 
generated during the nighttime production 
run of some physics calculation. The 
messages required to support such 
pictures contain about 256 kilobits each. 
In figure 15, we see displayed the rate 
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Figure 16. The total file transfer, in mega bits/second, 
into out of the CRAY-C machine to/from MASS and 
ATL combined. 
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Figure 15. The total traffic, in kilo bits/second, of 
text and picture data into the TMDS. 
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Figure 17. The. total file transfer, in kilo bits/second, 
to the CHORS from the CRAY-C machine. 
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Table 2 contains a summary of this 
data and similar data obtained for the 
other Cray computers. In every instance, 
peak values are used. This table 
indicates the degree to which we 
understand each path. What is not 
depicted in the figures and tables so far 
is the distribution in time and size of 
the messages on these paths. It is not 
depicted because we do not know it. We 
do know the maximum, minimum and type of 
the messages on most of these paths. We 
do know, figures 13 to 17, qualitatively 
how the load/utilization varies during 
the day .on most of these paths. 



4.2 Medium monitor 

If we had the Hyperchannel monitor 
device, the HMD described earlier, how 
would we use it and what sense could we 
make of the data collected by it? 

The measurement process is simply 
described. The HMD is enabled, allowed 
to collect as much data as required to 
achieve statistically meaningful n^sults, 
and then the collected data is analyzed. 

The tape is written with the 
selected portions of 390 frames per 
record. It represents therefore a time 
history of the state transitions of every 
active adapter in the network. To be 
more specific, the header of each frame 
contains the destination adapter address, 
the source adapter address, frame 
function and length of data associated 
with the frame. These four pieces of 
information together with the time-stamp 
allow us to accurately follow the state 
transitions of every active Wdapter in 
the network. Using the transmission 
protocols given in figures 18a and 18b, 
it is possible to determine the 
beginning, length and end of each message 
transmitted in the network as well as its 
path (source/destination address-pair). 
For a large sample of messages^ we could 
expect to determine the message 
throughput and delay associated with each 
path through the network at any moment of 
the day. However, this point needs 
elaboration. 

For the sake of the discussion that 
follows, let us define the Hyperchannel 
monitor (HM) as consisting of the HMD 
together with an output tape and the tape 
analysis program. * Barring collisions, 
for any given frame from some source 
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adapter, there is usually a corresponding 
response frame. A response frame can 
either be favorable (an ack) or 
unfavorable (a reject). In the case of a 
definite Response, the HM will detect it 
and modify the state of the adapters 
involved. The lack of a response will be 
inferred by the HM by observing the 
behavior of the unsatisfied adapter, e.g. 
it will retry the unacknowledged frame. 
After some suitable number of retries, an , 
adapter will abort the message 
transmission. Aborts can occur at any 
point in the transmission protocol, and 
the HM can detect (infer) them because of 
the transmitting adapter's deviation from 
it. A host may reattempt *.he message 
transmission upon receiving an abort 
indication from its attached adapter. 

Eventhough the HM is unable to 
distinguish between new and reattempted - 
messages, it does know not to count the 
bits of an aborted, partially transmitted" 
message. Consequently, while the HM can 
determine the network utilization, it can 
not determine the message delay (host to 
host transmission delay). .It can 
determine the message transmission delay 
associated with successful transmissions 
only. We could think of this as the 
"transmission delay", conditioned by the 
number of message aborts. It may be 
possible depending on the protocol 
involved to know the length of an aborted 
message. 

Once the path behavior is 
sufficiently quantified, it would be 
possible through various statistical 
methods to determine such things as if 
and when there is any mutual interactions 
between paths. 

So far, the network monitoring 
software resides in the Cray OSs only, 
and there are no plans to add a similar 
facility to the other hosts in the 
Craynet. In addition to this, since the 
monitoring software is resident at all 
times in the Cray OSs, efficiency and 
space considerations severely limit its 
capabilities and frequency of execution. 
Consequently, measurements are reported 
for half hour collection periods. That 
is to say, finer granularity is not 
possible without impacting host 
performance in unacceptable ways. It's 
fair to say that half hour wide 
collection periods do not provide a 
characterization of the instantaneous 
behavior of a network. Till now, our use 
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of monitoring software is at best 
incomplete (not resident in all network 
hosts) and too coarse to allow proper 
characterization of \network traffic with " 
the detail required for the NBS study. 

With respect to the medium 
monitoring approach, as of this writing, 
our outside vendor has not yet completed 
work on the Hyperchannel monitor. This 
$175,000 piece of equipment, originally 
lo be built in six months as part of a 
research contract with and funded by the 
NBS, is 13 months past scheduled 
completion. And because it is 13 months 
behind schedule, the success of this 
research project has been Seriously 
jeopardized. Not only that, the absence 
of this monitoring equipment has eroded 
the case for the design-model-measure 
methodology I have been trying to build 
within my own group. 

In any event, whereas we suffered 
.basically from too little data using the 
software monitor approach, we will 
probably suffer from too much data using 
the medium monitor approach. In ten 
minutes of monitoring the Hyperchannel 
under a 5% load, enough data will be 
collected to fill a standard size 
magnetic tape. It will probably take 
more than an hour to analyze this much 
data using something like a Digital 
Equipment Corporation VAX/780. This 
disparity between sample width and 
analysis time will prove v to be a 
troublesome but not insurmountable 
obstacle in the successful and timely 
completion of the NBS research project. 

5. Model Validation 

The model, which simulates Network 
Systems Corporation's (NSC) Hyperchannel, k 
is the same one used extensively in the 
past to produce much useful knowledge 
pertaining to certain inadequacies and 
interactions of the lower level 
protocols, 4 ' 5 inadequacies of the 
adapter (management and buffering 
strategies, 6 and probed the nature of the 
interactions of network traffic, 
configuration, and these protocols. 7,8 We 
have validated this model to a certain 
extent 9 and so have great confidence in 
it. However, given the nature of the 
current investigation and the importance 
of the role the model will play, we feel\ 
that it is necessary to formally and \ 
extensively demonstrate the model's \ 



validity. Once we have determined that 
it accurately portrays network 
performance at low to medium loads, we 
will use the model to study performance u 
and traffic characteristics at medium to 
high loads, a kind of computational 
extrapolation. 

We resort to Ihis extrapolation 
technique, because the Cray net as 
currently configured is not able to •• 
generate the level of traffic required 
for the complete! characterization of 
network traffic and performance called 
for in the NBS study. Consequently, our • 
approach will be to measure and 
characterize the traffic that exists in - 1 
the current, operational Craynet in 
numerous data transfer situations/ and 
then use the validated model to study its 
performance at higher loads. 

The rest of this section will detail 
the process of model validation, briefly . 
reviewing previous studies and papers 
exploring the possibility of using their 
results in this process. The validation 
process will make extensive use of the 
Hyperchannel Monitor Device (HMD). 

The first step will be tp determine 
that the sequence and timing of the 
frames of the lowest level protocol, 
figures 18a and 18b, exchanged in a 
simple two adapter configuration fere * 
exactly duplicated in sequence/and 
statistically and acceptably approximated , 
in timing. This step can be performed on 
either the Cray network or a test 
network. 

As a second step, the model will 
attempt to predict network performance in 
the three node, two data path 
configuration of figure 19.' In the past, 
we have studied this configuration, known 
as the message, switch, extensively 
because of its sensitivity to how adapter 
buffers are managed and to certain 
aspects of the lowest level 
protocols. 6 ' 7 ' 8 ' 9 Specifically, the high 
data transfer path will operate at its 
maximum rate, while the slow path will 
attempt to transmit at various, slower 
rates. Simulation results 7 have- shown 
that the fast and slow paths interfere 
with each other under these 
circumstances. We will determine the 
message throughput and delay of the two 
paths experimentally, using the HMD 
and/or processes residing in the attached 
hosts. Again simulated performance must ! 
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Figure 19. Uni-directional message switch. 



approximate measured performance in an 
acceplably statistical way. We will 
probably perform this step in a test 
network owing to the impossibility of 
dedicating the Craynet to experiments 
even of brief duration. A test network 
will probably, consist of three or four 
nodes and one bus, where the nodes will 
consist of a 7600 PPU/A120, a PDP11/A400, 
a SEL/A470, or a VAX/A400. However, it 
may be possible to conduct the message 
switch and other experiments on the 
Craynet during periods of minimal 
activity, collect the data with the HMD 
and subsequently identify and verify 
those periods of time within the sample 
in which the strict constraints of the 
original experiment were met. 

The last step in the model 
validation process will be to measure and 
model the sequence and timing of 
exchanged frames in a four node, two path 
configuration. Again, the goal is to 
verify that the model exactly duplicates 
the sequence of exchanged frames and 
suitably approximates their timing. 
— Experiments involving more nodes and 
buses are probably not possible since we 
can not mtefere with the normal 
functioning, of the Craynet. 

In a previous paper, 7 1 reported on 
the effect of segregating messages 
according to size, to wit, long and short 
nu'ssaues were each* transmitted over a bus 
dedicated to their respective sizes. The 
hypothesis was that somehow these two 



sizes (and types) of messages interfered 

with each other. The results, reported-in 

[7] do not support such a hypothesis, in 
fact, they indicate the contrary, i.e. 
the failure to segregate messages in this 
manner has no effect on network 
performance. It will not be possible to 
validate these results 7 using the HMD 
owing, to the impossibility either of 
reproducing the configuration in [6], a 
five node two bus network, as a test 
network, or of modifying the host systems 
in the Craynet to behave in this manner. 

Yeh and Donnelley 5 modelled the NSC 
Hyperchannel, (protocols and hardware), 
as it existed in 1978. Their simulations 
detected deadlocks and node priority 
reversals. Because NSC has long since 
remedied these shortcomings through 
modifications of adapter architecture and 
microcode, 1 see no possible way of using 
the HMD either in a test network or in 
the Craynet itself to validate their 
results. 

Yeh and Donnelley 4 studied their own 
solution to these deadlock/reversal 
problems, viz. the recycling timer, and 
compared its effects to those produced by 
NSC's modifications, viz. wait flags and 
a binary exponential retry mechanism. No 
Hyperchannel experiment could validate 
the results* obtained using their solution 
since this~ would require extensive 
modifications of Hyperchannel 
architecture and microcode. 

Model validation and performance 
monitoring share ^any of the same 
problems. In order to validate a model, 
the thing modelled must be subjected to 
experiments in which it is controlled in 
some precise way, a way that can be 
duplicated by the model. In the case o[ 
networks, "this "usually "entails modifying 
the operating system of the hosts 



involved in the validation experiment so 
that they use the network in some 
computationally meaningful way, e.g. 
submitting messages to the network that 
have nice distributions in size and 
interarrival time. Not- only must the 
experimenter control the presented load 
precise*', he must also have some way of 
measuring the network's response to the 
load with equal precision. Again,_one 
can either use a medium monitor and/or. 
system resident software to do this. If 
one decides to use system resident 
software, he is presented with the 
additional problem of how to synchronize 
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the various hosts in an experiment so 
that they all use the same time interval. 
If one relies instead upon some kind of a 
hardware monitor, he may be prevented 
from knowing certain aspects of the 
network's performance, e,g, message 
delay, that ultimately translates into an 
only partial validation of the model. 

Even though we do not yet have a 
monitor, we have been able to validate 
our model using system resident 
software. 9 The problems here were finding 
host computers that could be diverted 
. from their normal function long enough to 
conduct an experiment, and finding a 
willing systems programmer for each host 
involved^ J was able to do both, and in 
our o/ie. aiid only set of validation 
experiments, the message switch, figure 
19, alluded\to earlier, we strung 
together two CDC-7600 PPUs and a Systems 
Engineering Laboratory (SEL) 32/75 and 
their appropriate adapters. The 
operating systems in both the PPU and SEL 
computers is written in assembly 
language. Fortunately, the PPU operating 
system had already been modified for use 
as part of the Hyperchannel acceptance 
test procedure. All that remained was to 
modify the operating system of the SEL 
host. This was done, experiments were 
run subsequently and the results are 
reported in [9]. Given the great paucity 
of statistical routines in both operating 
systems (no random number generators, 
t mean or variance calculations), and that 
they were written in assembly language, 
and that both the system programmer and 
network technician were volunteering 
their 'spare' time to this effort, we had 
to settle for less than scientifically 
definitive, but nonetheless gratifying, 
results. 

A trrily and scientifically 
definitive validation of our model 
occurred almost incidentally when Franta 
and Heath 10 at the University of 
Minnesota validated their own analytical 
model of the Hyperchannel. They , were 
able to realize the ideal in model 
validation experiments, for they had at 
their disposal a 6 node Hyperchannel 
network over which they had total 
control — they provided their own host ■ 
system software; they were able to modify 
the adapter microcode to suit their 
needs; they had the use of a passive, 
medium monitor; and they had absolute 
control of and knowledge of the load 
presented to their test network. In 
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addition to validating their own modeL 
they were intent on studying the 
performance of the 'Hyperchannel, 
identifying its critical parameters and 
metrics. 

It was a simple task for us to 
simulate their experiments and compare 
our results with theirs. 9,10 

And at what cost all of this? The 
software alluded to above required about 
two and a half man years. All 
Hyperchannel hardware, microcode 
modifications and engineering support 
were supplied by Network Systems 
Corporation free of charge. And as for 
actual funds used then, I do not know nor 
am allowed to know or report the source 
or the amount of the research grant 
supporting this project. However, I have 
been told that within university 
environments and as grants go, this one 
was small. 



6. Extrapolate 

As 1 have explained earlier, the 
Craynet has never been able to generate 
loads greater than 5% of Hyperchannel 
capacity with normal traffic on a 
sustained basis. As a consequence, we do 
not have and will probably never have any 
real experience with Hyperchannel 
performance at medium to high loads (> 
50% of capacity, what I have termed 
hypothetical or extrapolated loads). 
However, we have already used the model 
to study its behavior at these 
hypothetical or extrapolated 
loads. 4 ' 5 ' 6 ' 7 ' 8 ' 9 In fact, most of the 
shortcomings and inadequacies of the- 
Hyperchannel alluded to throughout this 
paper occur only during such loads. Much 
of this early work represents what 1 have 
termed extrapolation — the exploration of 
realms of performance unreachable by 
means other than computer simulation. 
This is the more or less traditional use 
to which computer simulation t s put, and 
presumably what is learned thereby helps 
one subsequently to actualize designs 
capable of such performance. 

Extrapolation can be used in another 
sense here, not just , in regard to. 
studying performance at hypothetical 
loads of existing and/or experimental 
designs. The other sense of the\ word has 
to do with modelling or simulation 
itself. The layered nature of netwprk 
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functionality, both hardware and 

olocols, lends itselr^n^turally to an 
incremental model contruction 
technique. 11 That is, once we have 
confidence in the simulation and 
modelling of the lower layers, we are 
a^le to add onto it models and 
simulations of higher laye 
function ility. Easier said than done. 
Oike may be justified in addi:.. new layers 
to the model in that he has great 
confidence in its validity, but most 
computers and operating systems could 
caire less how justified you fee) or how 
much confidence you have in your own 
prjograms. Once your program no longer 
fits in memory, it no longer runs. Once 
your program grows to monstrous 
complexity, such large amounts of real 
time are required to produce results that 
it begins to lose significance as a part, 
of the design-model-measure paradigm. 

The growth of complexity and use of 
computer resources in modelling 
increasing layers of protocols stems 
dirpctly from the mutual interaction of t 
thesfe layers themselves and the\impact of 
thijs interaction on overall performance. 
Foif example, in the Hyperchannel each 
host to host message, requires the sending 
host to undergo a three step interaction N 
with its Hyperchannel adapter before the 
datja (the original message) can dross the ' 
interface between them. Once tha\ step , 
is Accomplished, the sending host's \ 
adapter and receiving host's adapt : er\go 
through a minimum of 8 steps to actually 
move the data between them. The \ 
receiving adapter must then go through \ 
som'e protocol to move the data it's just \ 
received (the original message) up into 
the receiving host. A considerable 
amount pf amplification can result as a 
message passes downward through layers 
and layers of protocols both in the 
number of bits (lower layers encapsulate 
higlier layer's data in passing it 
dow|rward_io_Jhe_ next layer) and in the 
number of interactions (hand shakes, 
buffer negotiations, exchanges of 
remembered state, acks etc.). 
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How does the modeller/simulaticnist 
eed in this matter given this fact of 
One can pre eed by means of what 
termed extrapolation in which that 
of a model dealing with the lower 
iayers Is replaced by something simpler 
thai imitates (simulates?) the behavior 
of the lower layers. To be more 
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fic, our approach will use data 



derived from Hyperchannel monitor studies 
to characterize Hyperchannel behavior, 
i.e. to determine 'the message delay 
distribution as a function of path, 
message size and ambient load, within the 
network. . We will also use the existing 
model to tell us what kinds of message 
delay distributions occur at extrapolated 
loads. Then we will build a new model 
that simulates the functionality of host 
to host message traffic and the 
appropriate higher layers of protocols, a 
model in which we replace the lowest 
layers of protocols and hardware (the 
Hyperchannel itself) with these delay 
distributions. 



7. Conclusion 

I have described a cyclic, 
design-measure-model methodology 
currently being used to " study a large, 
Hyperchannel based LAN at the Lawrence 
Livermore National Laboratory. This NBS 
sponsored project expects to characterize 
this network's traffic and corresponding 
performance. However, many questions 
remain concerning how best to proceed in 
this matter. 
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Abstract: 



This paper concerns itself with the determina- 
tion of the steady-state queue length distribu- 
tions at very fast merger nodes that are present 
within queueing networks. We study a network with 
a tree topology in which a given server at a node 
of the network provides each of its customers with 
an fc.jual, constant amount of service time. A very 
fast merger node is defined as a node at which the 
service rate is greater than or equal to the sum 
of the service rates of the channels feeding into 
the merger node. Inputs following general, in- 
dependent probability distributions are con- 
sidered. A condition for absolute stability at the 
merger node is derived. The exact queue length 
distribution is found for a subset of these fast 
merger nodes via a combinatorial analysis of pos- 
sible arrival patterns of customers into the 
merger node. 

I. Introduction: 

This paper concerns the study of the state 
probabilities of the queueing network shown in 
Figure 1 . In Figure 1 , each node shown represents 
an infinite storage, FCFS queueing facility 



^/^customer is considered to consist of a single, 
fixed length packet of ' data which is to be 
transmitted through the network. Class i packets 
initially enter the network at node N^^ where they 
are then processed and transmitted . to node. N, 
for additional processing. At node N 



equipped with a single server that has a constant 
service time of (customer/sec) (or service time 
of s^ sec/customer) i=1,2,...,n. 

We assume that the network is being fed by 
"n" mutually independent streams of customers 
governed by some general probability distribution 
with mean arrival rate of \^ (packets/sec.) for 
the class i stream, i=1,2,...,n, >. 0 • Each 

This work was partially supported by the National 
Science Foundation under grant ECS-8 105963 and by 
PSC-CUNY Research Award 13665. 
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n+1' 



h+1 
all n 



packet streams are merged and processed by a sin- 
gle common server. Hence, node N n+1 is called a 
merger node. 

This paper is directed towards obtaining vari- 
ous queue length characteristics at the merger 
node, N n+1 . The discussion is limited to the case 
where the service rate at N n+1 , /J n ^ • is greater 
than or equal to the sura of the service rates of 
the feeder channels (i.e., 
n 1 i n 1 



i=1 



n+1 



i=1 S i 



[1,23, it has been shown that condition on u .is 

r n+1 

a sufficient condition for absolute stability at 
"the merger node^when n = 2. Here, we extend this 
result to general values of "n". In addition, in 
those papers, for the subcase 



S n+1 - 1/n min (S 1 ■ s 2' 



s n > (or 



H n > n max (jj^, u 2 , u n >) , n=2, the' steady- 

estate _ _ wa i t i ng ..time . . s ta tis tics-, were, derived Here-,- 
we turn to the queue length probabilities and once 
again derive results for general values of "n". 

II. Stability Condition for Very Fast Merger Nodes 
Our ystudy begins with a derivation of an in- 
teresting stability property for the case 
n 

Hn+i 2l Z Pi • In earlier work [1,2], it was shown 
i=1 

that for the case n=2, the number of packets wait- 
ing for service on queue could never exceed one. 
This will now be generalized for arbitrary values 
of »n". 
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Theorem 1 : 



For the case u . > £ p , at the merger node 
i=1 



W < (n-1) s, 



N q < n-1 



(1a) 
(1b) 



where the quantities W and N- are defined as: 

W = The waiting time (queueing time) in an 

arbitrary packet at N R+1 

Nq = the number of packets on queue at N R+1 

Proof: 

The proof will be via a worst case analysis 

which occurs when u 1 is set to its minimum 
' n+i 
n 

value; that is, p n+1 = £ f^. This value of p n+1 

yields the greatest probability for the queue 
length to grow. 

n 

With u = £ u without loss of generality 
1 i=1 1 

one can sort the "n" parallel feeding channels 
into ascending order of magnitude and obtain 



s i = k i s n + 1 ' is1 ' 2 ' 



where 



(2a) 



(2b) 



^1-^(1/kj)J < k A < (n+1-i) £ 1-^| 1 (1/kj) J 



and 



y d/k.) = 1 
i=i 1 



(2c) 



A worst case arrival pattern into N « is now 

n+1 

constructed by assuming that class i packets are 

arriving into N n+i| at regular intervals of s i 

seconds. Thus, with 

t^ = the arrival time^into N n+1 of the j, 

class i packets, i=1,2,...,n, 
it follows that 



th 



t t .j = [V(j-1)jk i 3 n+l ; i=1,2, 



.n ; 



j=1,2,...,0<x i <1 (3)- 



where x^ is a fractional offset from time zero for 



the first class i arrival. 

An induction procedure will now be used to 
prove the theorem. This will be done by introduc- 
ing the notation, 

W^j = the waiting time at N n+1 of the. j th 
class, i packet, and showing that W i j < s n+1 
for all i and j. 
To begin, Theorem 1 is trivially true for the 
case n = 1. (In fact, in [1,2], it has been proven 
true for n = 2.) It is now assumed that Theorem 1 
is true for "n-1 11 channels feeding the merger 
node. It must now be shown that this implies vali- 
dity of theorem 1 for tt n n channels feeding N w 
(as shown in Fig.1) . 

With "n 11 feeding channels, it can be stated 



'n+1 



that 



*11 < (n-1) s n+1 ; 



•n 



(4) 



This is true because until the arrival of the 

first class i packet, node N A sees an arrival 

n+l 

pattern from at most n-1 classes of packets 

(excluding the class i stream) with s « less than 

n+l 

the parallel combination of the service times of 
the feeding channels. Thus, (4) is true by the 

induction assumption. 

I To complete the proof, we need only show 
w ij 1 (n-1)a n for all j. We need concern our- 
selves only with the case where the busy period 
has been continuous from t^ through t^j. Other- 
wise, a renewal point of Process m would have 
occurred, where Process m is defined as the pro- 
cess whereby a class m packet initiates the 
current busy period m=1,2,...,n. 

To derive an expression for W±y it - is- 

observed that in the interval (t. t, .), there 

n i i j 

will be j-2 additional class i arrivals and 
class m arrivals. Consequently, 



E 



fc ij^ii 



n+1. 



W ij = [ (t ij +W i1 +s n+1 )+ <J- 2)s n+1 



MiLVn+lJ S " +1 



(5) 
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GO 



< [(t i1 *W i1+ 3 n+1 ) + (j-2)s n+1 



+ £(t i j-t i1 )/(k n s n+1 ) 
m=1 



-t 



ij 



n (j-Dk i s n+1 

= W + (j-Ds n+1 - (j-Dk i s n+1 + I s 

m= 1 m 



mdi 

= w n + ( J- l)s n + 1 - ( J- l)k i s n + 1 

♦ [ s n+1 

• = W n <(n-1)s n+1 

where we have use of both (2c) and (4). 

This proves (1a). Equation (1b) follows 
directly from (1a). Q.E.D. 

Theorem 1 has direct implications on buffer size 
requirements at a very fast merger node. 

III. The Case s n+1 < 1/n min(s 1 ,s 2 , . . . f s n ) (or 

^n-H I n max(p 1f p 2 p n )) 

For the case < 1/n min(s 1 ,s 2 s n ) f it 

is intuitive that the number of packets serviced 
within a busy period at N n+1 can also not exceed, 
"n". (For example, if "n" packets arrived in a 

common interval of s V seconds, then a ^second 
n+l * 

packet belonging to any class can not arrive into 



IT - for at least ns„ i seconds from the arrival 
n+i n+i 

time of its first packet. By that time, all "n" 
original packets must already have been serviced 
at N n+1 and, hence, the busy period ended.) Furth- 
ermore, each packet serviced in a busy period must 
come from a different class. 

The following theorem, concerning the queue 
length probabilities at node N n+1 , can now be 
stated 



Theorem 2: 

For the case s n+1 < 1/n min(s r s 2 s n ) , the 

equilibrium queue length distribution at a random 
instant of. time at node N R+1 of Figure 1 is given 
by 



n n c ii M(-n) k "\ 
j=0k=j 5 



(6) 



where 



r the equilibrium probability that node 

N „ contains M i w packets (both on 
n+i 

queue and in service), i=0,1,. ..,n; 



C i j = the total number of arrival patterns 
that will leave "i" packets in the 
system at a. random observation epoch, 
given that there were "j" arrivals in 
the ^ n ^ s n+ -|) seconds directly preced- 



ing the observation epoch; 



■ (a ,t 



L=0 



(6a) 



x k =0 



x k =0 



_ k=1 ,2, . . . ,L, k=L+1 ,L+2, . . . , j-i-1 





r k - 1 i 




r 


TV 


J- I x m 
m=1 






k=1 


V 








* > 







. .m=X 
L 

i+L- J x 
Us 1 



j-i-1 • 
^n-j + i) m=L + 1 n 



where 



k-1 



y k = i-2+k-£ x m ; k=1,2 L 



m=1 



k-L-1 

= k-L- £ x m ; k=L+1,L+2,...,j- 



i-1 



m=1 



(Note: see Appendix A for the derivation of C^) 
V k = the sum of the products of p i taken "k" 
at a time, k=0,1 n; that is, 
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n-1 



2 3 

Vl s iiij=i + i 

n 

n i=1 ri 



where ^ -. V^,, i=1.2, 



Proof: 

To begin the proof, Figure 2 illustrates the 
random instant of time at which an observation is 
to be made of the state of the system at node 

N n + r 

It has already been shown that the total 

number of packets resident at N can never 

n+1 

exceed M n". This fact, implies that any packet 
present in the system at our observation epoch 
must have arrived within the immediately preceding 
ns n+1 seconds. Consequently, to illustrate this in 
Figure 2, a time line has been drawn of length 



ns 



n+1 



seconds extending back from our random 



observation epoch (random time t) . In addition, 

; the time line has been partitioned into "n" equal 

intervals (or slots), each. of length s 4 seconds, 

n+i 

A pause is taken to note that since 

S n+1 1 1/n min ^ s -| » s 2» • • • » s n^ • tne raaximum length 
of a busy period must be ns n+1 seconds. Hence, a 
renewal point, which for our purposes is a conclu- 
sion, of a busy period, must occur within the ns n+ -j 
seconds depicted in Figure 2.- 

Continuing with the proof, it is noted that 

over the 



since a n+ « < 1/n min(s 1 ,s 2 



interval (t-ns n+ «,t), at most one packet from each 
of the "n" different classes of packets arrived 
into N n+1 « In addition, if a class "i w packet 
arrives within this interval * it arrives with 
equal probability within any of the ff n" slots 
shown in Figure 2. This is true simply because 
our observation epoch is completely random. Conse- 
quently, the total number of ways for "j" packets 
to arrive at N n+4 within the "n lf slots preceding 

the observation epoch, drawing from our population 
of 



different classes of packets, is 



(7) 



Now, with Cj^j defined as above, it can be con- 
cluded that the probability of observing "i" 
packets in the system at the observation epoch 
conditioned on the event that "j" packets arrived 
in the last "n" slots is 



ii- 



Hoting that the probability that a class "i" 

packet arrives within any particular slot is given 
by 



Pi = Vn + 1 : is1 ' 2 ' 



and that the probability that a class n i n 
arrives over the interval (t-ns r ^,t) is 



np i = 

it follows that 



n> s s 



i°n«.1' 



1*1.2, 



(9) 
packet 

(10) 



the probability of having "j" 



arrivals over the 
ns n+1 ,t) is given by 



,f n rt *ots of the interval, (t- 



(11) 



(np T ) (np 2 ) . • .CnpjX 1-np J+1 )C 1~np J+2 ) . . .( 1-np n ) 

...(1-np n ) 



+ ( Ti )( T2 K -- (n Pj-i )(l -Tj )(1 " n P>i ) 

...(1-np n-l )(np n ) 

i 

+ ( 1-np l ) ( 1-np 2 ) . . . ( 1-np n _j> <np n _ j+1 "> . - 
k=j ^ J J 



(-D^^n)^. 



where V k is as defined above. 
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This result, (11) , 
was derived by calculating all possible combina- 
tions of having "j" different arrivals from the 
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"n" possible aliases over the- indicated interval. 

Of consequence, using (3) and (6), it is found 
that the joint probability of observing "i" pack- 
ets in the system at the observation epoch AND 

having "j" arrivals within the last "n" slots is 
given by 



][ t |i|(-D k " j (n) k V k ] 



(12) 



C ij J. 



Finally, to conclude the proof, the uncondi- 
tional probability of observing "i" packets in the 
system at the observation epoch is obtained. This 

is done by summing (12) over all possible "j". The 
result is 



general n.) 

To complete this section, the queue length 
distributions for a specific class packet at the 
merger node is derived. To do this, we define 

= the steady-state probability of having 
"i" class "j" packets in the system at 
the merger node, -1=0,1; j=1,2,...,n 



P ( J\ i the steady-state prbability of having 
q(i) 

"i" class "j" packets on queue at the 

merger node, i=1,2; j=1,2,...,n. 

and ?^]s can be derived using d similar 
i q(i) 
approach as that used in the proof of Theorem 1. 

However, a much simpler derivation is possible if 



n n W(-n) k ~ j V k 

P - 5"-*^ r-^ — 



(13) 



which proves the theorem. Q.E.D. 

With a bit of tedious algebra, one can now 
proceed to determine the steady-state mean charac- 
teristics at N „. Defining 
n+1 

"N = the steady-state mean number of arbitrary 
class packets in the system at node 

n+1 

"N = the steady-state mean number of arbi- 

q 

trary class packets on queue at node 

N n + r 



one finds 



and 



N = I t? ± = V-+ I 

ito 1 1 k=2 2 k 



n-1 n . . 

« i=0 1+1 k=2 2 k 



(14a) 



(14b) 



which are pleasing results. 

Little's Formula [3] can now be used oh "(14a) 
and (14b) to find the steady-state,, mean response 
time and mean queueing time respectively. (This 
extends the results of our earlier papers [1,2) to 



one looks at (14a) and (14b) and realizes that 



(j) 



and 



(15a) 



(15b) 



Then, using the fact that not more than a single 
packet from each class can be processed over (t- 
ns n+1 ' f t), simple symmetry arguments yield - 

P (j) . v ; ♦ fr^v .J-*.*.— ••» (16a) 



B (j) 



j-1.2. 



(16b) 



and 



D (j) - i_p(J) 1-1 P 

P q(0) " 1 P q(1) f J 1,2f 



(17b) 



where~V^is-found^y~taking--onl-y -those-terms — from - 
V k in which pj appears and is defined as the pro- 
duct of pj AND the s^m of the products of the p i 
taken w k w at a time., i=1,2,....n; i i j ; 
k=0, 1 , . . . »n-1 . 

IV. Conclusion: 

In this paper, we have successfully derived 
the steady-state queue length distributions at a 
merger node for a certain class of network condi- 
tions. We calculated these results by an analysis 
of the arrival patterns of the input streams into 
-the merger node. We believe this technique can be 
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utilized in the study of a variety of network 
structures. 

References: 

[1] C. Ziegler and D.L. Schilling, "Waiting Times 
at Fast Merger Nodes," Conference Record of ICC 
•80, pp. 23.2-1 - 23.2-6, June 1980. 

[2] C. Ziegler and D.L. Schilling, "Waiting Times 
at Very Fast, Constant Service Time Merger Nodes," 
accepted for publication in IEEE Transactions on 
Communications. 

[3] J.D.C. Little, "A Proof of the Queueing For- 
mula L= > W," Oper. Res,, Vol. 9 PP. 383-7, 1961. 

Appendix A: / ) 

In this appendix, we derive the equation flor 

C i j. This will be done by first solving for abme 

simple cases and then generalizing the results. 
We begin by noting that 



C ij = 0, fori>j 



(A-1) 



Hence, we must only consider OOXjOi. 

For i=j, all "j" arrivals must occur within 

slot 1. Thus, remembering that there are "n" dif- 
ferent classes of packets,. we have 



(A-2) 



For i= j-1 , there are two possibilities; 
namely, having "j-1" arrivals in slot 1 and one 
arrival in any other slot OR having "i" arrivals 
in slot 1 1 arid "" j-i" alrrlVals in slot ' 2 , 
i=0, 1 , . . . , j-2. Consequently, 

v,,rWh-")<-;?(;)fe) 



■ (3) -KM. (3) 



(A-3) 



Moving to the case i=j-2, we find that there 
are now three possibilities: 

1) "j-2" arrivals occur in slot 1 AND "i" 
arrivals occur in slot 2 (i=0,1) AND "2-i ff 
arrivals occur over the remaining "n-2" 



slots, 

2) ."i" arrivals occur in slot 1 (i=0,1 , . . . ,j- 
3) AND "j-i-1" arrivals occur in slot 2 AND 
one arrival occurs in any one of the > 
remaining "n-2" slots, 

3) n i n arrivals occur in slot 1 (i=0, 1 , . , . , j- 
3) AND "k" arrivals occur in slot 2 
( k=0 , 1 , . . . , j-i-2) AND " j-i-k" arrivals 
occur in slot 3. 



This yields 



♦I@TM(j:li) 

• (3) i co [fcO<-> 2 -' 



(A-4) 



In a similar manner, one can show that for 

i=j-3 there - are four possibilities to consider 
which yield ; ._. 



(Mj ; 3,-i- k ] (n _ 3) 3-l-k 
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+ 



■ (3) L £ G) M 



Proceeding recursively with this method, one 
finds that for the genera?, term, i=j-q, we obtain 

C J-q.J- (A - 6) 



+ 



fj-i-k-L-. . .-ml . , q-i-k-L-. . .-m 
q_i_k-L-...-mJ^ n - q; 

-k-L-. . .-m 



+ 



feiilS:::^'"--''-'- 

T T i - **t"' [J] [?] (*ti ... 

i=C k=0 L=0 m=0 ' L K J L L J L m J 



q-2-L- • • .-m 



+ 



+ 



+ 



r j j x ... i (?) (v) p-;i ... (^-] 

i=0 k=0 L=0 m=0 1 ' ^ ; L J ^ J 

J-j-i-k-L- 

T T J "T" - **■?*•" [/I M M 

i=0 k=0 L=0 ' msO . 1 ' 1 •> J 

p_i_k-L...J £j-i-k-L-...-mj 

W Jo Jo Jo Jo p?o "WW 



which can be written as 
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c ij 



(A-7) 



as 



I 

x k -0 



x k -0 



_ k-l,2,...,L k»L+l,L+2,... ,j-i-l 



n 

k-l 



j-i-1 
L 

i+L- 7 x 



k-l 
j.Jx 

x. 



j-i-1 



where 



k-l 



i-2+k- ^ V . k«l,2,...,L 



m-1 



(A-7aj 



and 



k-L-1 



z. - k-L- £ V k-L*l,L+2,...,j-i-r 
m«l 



I 

x k -0 



x k -0 



k-l,2,...,L k»L+l,I.+ 2,..., j-i-1 



(A-7b) 



(A-7C) 



i-1 i*l-«i-« 2 *L 1 

II I ... I I I I "... '2 



2-Xj 3-x r x 2 Ij.i.j 



x l-° x 2-° X T° X L"° X L+1"° X L+2-° X L+3"° X -> 



j-i-1" 



An alternate form for is 
C ij " 



(A-8) 



(?) t (>-M 



*k 
x k -0 



x k -0 



k-l,2,...,L k-L+l,L+2,..., j-i-1 



L 



k-l 1 



. j-i-1 

n 

k-L+l 



C1«h 2 



k-l 



j-i-L- 2 



ra-1 



j-i-1 



Clans n » 
P»ck»t« n 



Fig, It A Queualng Notwork Conaiattcg of "n+1 " Stogie Server Queuoa and 
M n " Extorool loputa with a Mercer Node at N + ,, 



t-e, 



n+1 



• • • 
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| g I f f Obaervation 



"Vl tEP ° nh 



°n+l 



Fig, 2: Illuatration of a Random Obaervation Jpoch at the Merger Node of Figure 1, 
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THE APPLICATION OF MULTIVARIATE STATISTICAL TECHNIQUES 
TO COMPUTER PERFORMANCE EVALUATION 
USING SIMULATED DATA 
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This paper considers the application of multivariate 
statistical techniques to the analysis of data for use in 
computer performance evaluation (CPE) . Traditionally, 
multilinear regression analysis has been used in the 
analysis of CPE data. More recently cluster analysis has 
found applications in this field, both in workload analysis 
[1] and in performance modeling [2] . However, both 
approaches have problems when applied to the type of data 
often encountered in CPE studies. In recent years, several 
new statistical techniques have been developed to overcome 
or.. compensate for these problems. Although these 
techniques have been used in a variety of business and 
social applications, little has been reported on their 
applicability to CPE. This study considered a total of 



multivariate statistical techniques: multilinear. 



regression (as a baseline technique), cluster analysis, 
ridge regression, automatic interaction detection (AID), 
canonical correlation analys is , factor analys is , and 
discriminant analysis. Each technique was examined for its 
theoretical capabilities and expected usefulness in a CPE 
environment. Then each technique was applied to several 
sets of CPE data. In order to conduct a controlled 
experiment, the data was generated by CPESIM, a. simulation 
of a multiuser mainframe computer system [3]. Based on the 
results of their application to simulated CPE data, it was 
found that all of the techniques could be useful to varying 
degrees and in varying ways [4] . Some gave better 
predictability than regression, based on the r-squared 
value, some gave a better analysis of , interrelationships 
between parameters , and so forth. Because of their 
individual natures, then, each technique was found to be 
most useful to particular types of problems. Thus a given 
CPE problem might not be able to use all of the techniques. 
As a whole, however, these techniques should be considered 
by anyone performing statistical analysis of computer 
system data. 
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Key Words: Automatic interaction detection (AID); 
canonical correlation analysis; cluster analysis; computer 
performance analysis} computer performance evaluation; CPE; 
discriminant analysis; empirical modeling; factor analysis; 
multilinear regression analysis; ridge regression; 
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1 • Introduction 

There has been considerable,' 
interest over the years in modeling 
computer systems for computer per- 
formance evaluation (CPE) purposes. 
Three types of models are usually 
recognized: simulation models; ana- 
lytical models (such as queueing); and 
empirical models. The use of these 
mo dels usually falls into one of two 
catagories. These are for predictive 
purposes (to pose "what if 11 type of 
questions for planning) and for 
explanatory purposes (to gain a better 
understanding of how the system works). 

The two types of empirical models 
classically used in CPE are multilinear 
regression analysis and cluster anal- 
ysis. Typically regression analysis 
has been used for performance modeling 
(e.g. modeling turnaround time as a 
function of several workload 

parameters) [1], while cluster analysis 
has been used for modeling workloads 
[2], although an extension of 
clustering to performance modeling has 
been suggested [3]. However, several 
other statistical.' techniques have 
become available in recent years. 
These have found useful applications in 
the areas of operations research and in 
the social sciences . Little, however, 
has been reported as to their 
applicability in the area of computer 
performance evaluation. 



This study considers the 

application of five new techniques to 
CPE use, along with multilinear 
regression analysis and cluster 
analysis as bases for comparison 
[4], [5]. The five techniques 

considered are ridge regression, 
Automatic Interaction Detection (AID ) , 
canonical correlation analysis, factor 
analysis, and discriminant analysis. 
The objective of this paper is not- to 
provide a theoretical foundation for 
these techniques (although appropriate 
references are , provided for 7 the 
interested readier), but' rather/ .to 
present an overview of the techniques 
and the result j of the application of 
these techniques ? in a CPE environment. 

lFigures in brackets indicate the 
literature references at the / end of 
this paper. 



The , 
regress ion 
ana lysis, 
introduc t ion 
techniques , . 
2. Then the 
this study 
Section 3. 
tec hn iques 
d i scus s ion 
exper imen t • 
results and 
pr es en t ed . 



basic shortcomings of 
analysis and cluster 
along with a brief 
to the f iv e new 
are provided in the Section 
CPE environment' in which 
was made is discussed in 
In Section 4, each of the 
is presented, along with a 
of its results in the 
Finally, a summary of the 
associated conclusions is 



2. Overview o f the Techniques 

Multilinear 'regression analysis 
was probably the earliest ' used 
empirical modeling technique in CPE. 
The basic approach of regression 
analysis * is to model a dependent 
variable as a function of several 
independent (or predictor) variables, 
on the assumption that this represents 
a cause and effect relationship. In its 
most popular form, multiple independent 
variables are allowed, and the 
dependent variable is expressed as a 
linear combination of the independent 
variables, with the coefficients being 
the primary model parameters. In an 
empirical technique a black-box 
approach is taken. Rather than start 
with an understanding of how the system 
works, a model is assumed (e.g. the 
linear relationship just described) and 
the mo del par am eters are evaluated from 
empirical data. This is accomplished by 
observing the values of the dependent 
and independent variables in the real 
system being modeled, then finding a 
"best" fit of the model to this 
empirical data. 

Several statistical measures can 
be calculated to determine how well the 
model fits the data. Perhaps the most 
important is the coefficient of 
determination, often referred to as the 
r-squared value. This number, (between 
0 and 1.0) indicates the proportion of 
variation of the dependent variable 
about its mean value that can be 
explained by the model. Thus a model 
of computer turnaround time with an 
r-squared value of .82 explains 822 of 
the variation in turnaround. Although 
an r-squared value of 1.0 would be 
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ideal , dur t 6 flit* at oChmic nature of 
CPE . data. such a value is not to be 
expected . An r-squared greater than 
measure of the model itself is the 
statistical significance of each of the 
coefficients. This provides a means of 
testing the hypothesis that a given 
coefficient is zero,. If that 

hypothesis cannot be rejected, then the 
coefficient can be considered 

statistically zero, which means that 
the corresponding independent variable 
does not influence the dependent 
variable, and thus can be eliminated 
from the model. This is an important 
tool when using the model in an 
explanatory mode, to understand how the 
system works. 

There are many restrictive 
assumptions made about the data when 
per form i a g a multilinear regression 
analysis. The most important is that 
the relationship is linear. In many 
real cases it is not. However, the 
ability to detect nonlinearity and the 
formulation of an appropriate' nonlinear 
regression model are not simple, and 
are beyond the experience of many 
practical, CPE analysts. Another 
problem that is common in a CPE 
environment is that the independent 
variables are not truely independent 
f rum each other, but rather may be 
highly correlated with 6>ach other. 
This Lib the result of distorting the 
coefficients, and can lead to a 
variable being dropped from the model 
as unimportant when in fact that is not 
the case. If such a high correlation 
does exist between variables, the data 
tu be multicolinear. The 
capability of the resulting 
::iodel is often satisfactory, 
an, explanatory point of view 
the importance of predictor variables 
aay be misrepresented in the model. 

Ridge regression attempts to 
reduce Lhu effects of multicolinearity 
in a regression model [ 4 ] [ 5 ] . As such, 
it is basically a variation of 
multilinear regression analysis. A 
small amount of bias is" injected into 
the calculations in order to generate a 
set of coefficients that compensate for 
the effect of multicolinearity. 

Factor analysis carries this 
approacn one step further by trying to 
find a surrogate variable or " factor" 
.to replace the two or more variables 
that show high correlation [6]. Thus 
in the- previous example a single factor 
"job size" might replace the correlated 
variables. memory size and CPU time. 



is said 
predictive 
regress ion 
but from 



Although or iiginally used as a tool 

to" "help"* — c'hTTffirrerrze woTirload-s for* 

generating simulation models, cluster 
analysis can help detect 

multicolinearity.' Thus , f or .ejtarag 1 e , 
one might find high correlation between** 
memory requirements and CPU time. This 
might show up as a cluster of jobs with 
small memory requirements and small CPU 
requirements, and another cluster of 
jobs requiring both large amounts of 
memory and large CPU times. In 
addition, cluster analysis is useful 
even when the relationship between 
predictor variables ir. nonlinear. 

Discriminant analysis is similar 
to cluster analysis in that one tries 
to determine a discriminant function 
which can then be used to assign jobs 
to different groups [7]. 
be in separate groups is 
to , determ ine the 

discriminant function, 
f unc t ion can 
new jobs into 



Data known to 
first ana ly zed 
appropriate 
The resulting 
then be used to classify 
the appropriate group. 
This technique requires that the groups 
be known in order that the discriminant 
function can be determined, while 
traditional clustering algorithms 
determine the groups by .iterative 
techniques. 



Canon ic a 
similar to 
a 1 1 emp t s tor 
of the data 
var iab 1 es to 
variables . Th 
mu 1 1 i p 1 e d 
example might 
I/O t i m e r e q u 
as 'dependent 
number of dis 
I /Os, and n 
independent v 



1 correlat 
factor ana 
educe the 
It r e 1 a 
a sec 
i s imp lies 
e p en d en t 

include t 
ired, and 

variab les 
k I/Os, 
umb e r of 
ar iab les . 



ion analysis is 
lysis in that it 
d iraen s io na 1 i t y 
tes one group of 
ond group of 
a data set with 
var iab les. An 
urnaround time, 
input queue time 
, with CPU time, 
number of tape 
lines printed as 



Another problem with multilinear 
regression analysis occurs when the 
'value of one variable changes ^ the 
coefficient (i.e. the effect) of 
another variable. For example, as a 
computer job's memory requirement gets 
larger, the effect -of— -that — -job-U — CPU- 
time on its turnaround time may get 
smaller (perhaps because fewer jobs can 
fit into the system, hence the job gets 
more frequent access to the CPU). Thus 
the coefficient of CPU time for overall 
jobs isn't really a constant, although 
regression analysis would assign one. 
Such data is said to be nonadditive. 
This condition affects both the 
predictive and explanatory power of the 
model, although the former can be 
torrected for, once n o na d d i t i v i t y has* 6 
been detected. 
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Automatic Interaction Detection 
(AID)° is useful in detecting 
nonadd.it ivity [8]. It is similar ^o 
cluster analysis in that it attempts to 
partition the data into separate 
groups. However, whereas cluster 
analysis lookb only at the independent 
variables to form clusters, A 1 0 uses 
predefined boundaries in the dependent 
variable to group the data. If 
nonadd i t iv i ty is detected, it can be 
corrected for in the regression model 
by introducing a crossproduct term 
between the variables in question, 

\ 

Clearly there are some differences 
in the types of data or situation in 
which these different techniques might 
be applied. It is unlikely that -all, 
techniques will be useful in a given 
problem. In order to compare their 
usefulness, however, an attempt was 
made to apply them in as similar an 
environment as possible. Tfiis" 

environment is discussed in the next 
section. 



3. The Experimantal Environment 

In order to test the various 
statistical techniques under 

consideration, a source of computer 
system performance data was needed. 
However, as noted in the last section, 
not all of the techniques were expected 
to be equally applicable to a given 
problem. On the other hand, if 

significantly, different systems were 
used for the tests, then comparison of 
the results would be less meaningful. 
Therefore, it was decided that for the 
initial evaluation of these techniques 
simulated data would be used. Due to 
its ready availability, user 

familiarity, and the fact that it was 
developed for CPE experiments, the 
CPES1M simulation was selected. 

The CPES IM s imula t ion was 

developed at the Air Force Institute of 
" Techno 1 6 g y ~ "in 197 9" Vs a Fea'cffi n~g ailf""" 
for the CPE course taught there [9]. 
CPESIM is a simulation of a single 
processor, mu 1 1 iprogrammed , batch 
oriented mainframe. The peripheral and 
operating system configurations and the 
workload can be controlled. \ The 
simulated computer processes the 
simulated workload, and generates 
accounting data, software monitor data, 
and hardware monitor data. Thus the use 
of this simulation allows data to be 
generated under controlled and variable 
conditions. 



The CPESIM accounting data was 
used as the source of the data for this 
study. Table 1 shows the variables 
used. Turnaround time w*s used as the 
dependent variable in all cases, and 
I/O time was also used as a dependent 
variable in the canonical correlation 
analysis. '■ 



Table 1. gPESIM Independent Variables 
( D e f i n e d for Each Computer Job) 

Variable Description 

CPU CPU second^ used 

MEMORY K-bytes of CM used 



of 


cards 


read 


of 


1 ine s 


printed 


of 


disk 


accesses 


of 


tape 


a c c e s ti e s 


of 


tapes 


moun ted 
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TARRIV Job arrival time (in hours) 

IOTIME Total job I/O time (seconds) 

/ Highly correlated 
• with DISKIO & TAPEIO 

TURN A job f s batch turnaround time 



Two primary data sets and two 
vari nts were created using CPESIM. 
These are defined in Table 2. Not all 
of thpse sets were used for every 
technique. In addition, Data Ret 3 was 
created for use in demonstrating ridge 
regression. This latter set was not 
generated by CPESIM, but was 
hand-buil t . 



4. Test Results 

Multilinear regression analysis 
was used to model turnaround time as a 
function of the independent variables. 
This provided a base line for comparison 
of the other techniques. The regression 
results are indicated in Table 3. The 
mult ico linearity index was generated 
using the ridge regression technique, 
as presented later in this paper. Note 
that Data Set 1 , which excludes IOTIME, 
exhibits little multicolinearity, while 
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Table 2 - CPESiM D* a Sets 
Data Set Observations D icription 



787 



la 



Complex workload from four 
organizations, earn 
different, over f iv ? 8-hour 
days. High degree of 
competition for resources. 
IOTIME not included. 

Same as Set 1 except IOTIME 
is used as an idependent 
variable, introducing known 
mul t ico 1 inear i ty . 

Simplest data set from one 
organization over 4 days . 
Virtually no competition 
for resources. IOTIME is 
inc lud ed • 



452 Same as Set 2 ej.^ept ex- 
tended to twenty days data. 

25 This set consists of three 
variables. XI and X2 ere 
statistically independent , 
while X2 and X3 are highly 
correlated . 



787 



106 



Data Sets la and 2 show increased 
levels of mul t ico 1 inear i ty due to the 
high correlation of IOTIME with TAPEIO 
and DISKIO. This effect is diminished 
in Data Set la due to the complexity of 
the syotem and the high competition for 
resources. Data Set 3 was regressed 
both without X3 and with X3 included. 
Note that the coefficients' of X2 and X3 
have a compensating effect, resulting 
in a predictive capability equivalent 
to that obtained when X3 was excluded 
from the mod e 1 . 

4.1 Ridge Regression 

The first new technique considered 
was ridge r egr es s ion . The objective » of 
this technique is to detect the 
presence of mul t ico 1 inear it y and to 
correct for it by generating an 
improved set of regression 

coefficients. There is no intent to 
change the predictive capability of the 
model. The benefit gained is a set of 
coefficients that more truely represent 
the importance of the predictor 
variables. The ridge regression' 
program generates the new coefficients 
and also a Variance Inflation Factor 
(VIF) which is. an index of the amount 
of multicolinearity . Thus little 
change would be expected for Data Set 
1, some change for Data Set la, and 
noticable change for Data Set 2. 



Table 3 - Multilinear Regression Results 
(Standardized Coefficients) 



: Multico- 

Data CPU Central Tape Line.* Disk Tape I/O R- linearity 

Set Time Memory Drives Prntrf i/Os i/Os Time Sqrd Index 

1 .123 .3S9 .223 .046 .156 .097 424 1*2 

la .112 .371 .235 .025 .053-.132 .265 8.3 



Data Tape Disk 
Set Drives l/0s 



tape r/ 0 ~ M-a-l-t-i-c-o-l-inear -it-y- 

I/Os Time Sqrd Index 



,00094 -.00009 -.00150 .00173 .8334, 69.5 
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Data 
Set 

3 
3 



XI 

4.07 
3.84 



X2 

0,34 
28.68 



X3 



R- Mul tico linearity 

Sqrd Index 



•28.54 .803 2458.9 
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The approach is to recalculate the 
regression coefficients, introducing a 
small amount of bias k into the 
calculations. When k«0 , the same 
values are generated as for multilinear 
> It is desirable to keep k 
as, possible. The algorithm 

some 



regress ion 
a s sma 1 1 
inc r ease s 
heuristic 
Those criteria 
ef for t inc lude 



k i t era t ive ly un t il 
stopping criterion is 
considered in 
(1) all VIFs less 



met* 
this 
than 
[4] ; 
Data 



10 [ 5] ; (2) all VIFS less than 1 
and (3) all signs correct [5], 
Set 3 is used to illustrate the effect. 
In this set XI and X2 are statistically 
independent while X2 and X3 are almost 
perfectly correlated. 



The -results are shown in Table 4. 
For Data .Set 3, two of the heuristics 
resulted in nearly the same 
coefficients as were generated' by 
multilinear regression when X3 was not 
included. For Data Set 2, the 
coefficients changed significantly with 
little degradation in r-squared. For 
Data Set la, there was a noticeable, 
but small change in the coefficients. 
Again r-squared was not affected. 
Finally, as expected, for Data^et 1 
two of the heuristics ind ic^t-^d^th at no 
bias should be^$jLsterd — "at all. The 
criteriati^^arin VIFs less than 1" did 
^e^u-irfe^ a bia-s — o£._ k*0 . 1 0 5 , but the 
actual coefficients change-d little. 



Table 4. Ridge Regression Results (Standardized Coefficients). 









Data Set 


1 (I/O 


Time Exc luded . 


) 






CPU 


Central 


Tape 


Lines 


Disk 


Tape 


1/0 




k 


Time 


Meraor y 


Drives 


Prn td 


I/Os 


l/0s 


Time 


Heuristic 


0 .000 


.J 23 


.359 


.223 


.046 


.156 


.097 




VIFs<10, Signs 


0 .105 


.11& 


.329 


.211 


.053 


.145 


.098 




VIFs < 1 



c 



Data Set la (1/0 Time Included v ) 





CPU 


Central 


Tape 


Lines 


Disk 


Tape 


1/0 




k 


T ime 


Memor y 


Drives 


Prn td 


l/0s 


l/0s 


T ime 


Heuristic 


0 .000 


.112 


.371 


.235 


.025 


.053 


-.132 


.265 


0LS 


0 .01 5 


.111 


.365 


. 23 1 


.030 


.084 


-.059 


.181 


VIFS < 10 


0 .070 


.109 


.347 


.222 


.037 


.105 


.002 


.117 


Signs 


0.110 


.108 


.335 


.216 


.040 


.107 


.015 


.105 


VIFs < 1 



Data Set 2 



0 .000 
0.012 
.0 .044 
0 .092 



CPU 
Time 



Central 
Memor y 



Tape 
Dr iv ec 

• .00094 

• .00133 

• .001 24 

• .00102 



Lines 
Prn td 



Disk 
I/0b 

.00009 
.00001 
.00000 
.00001 



Tape 
l/0s 



I/O 
Time . 



Heuristic R-sqr 



.00150 .00173 0LS .8334 

.00025 .00051 VIFs<10 .7553 

.00000" .00026 Signs .7199 

.00006 .00019 VIFs<l .7093 



Data Set 3 





k 


XI 


X2 


X3 


Heuristic 


R-sqr 


0 


i00 0 


3.84 


2 8.7 


-28 . 5 


OLS 


. 8030 


0 


.004 


4.0; 


1.12 


-0,79 


VIFs < 10 


. 7932 


0 


.024 


3 .97 


0.34 


-0.01 


VIFs < 1 




0 


.026 . 


3.97 


0.33 


0.00 


Signs correct 


.7922 



ERIC 



69 ' 



Overall, ridge regression did 

=im.p-r-o-v-e the iregr es-aJLo-n cJxeXfi.cxen.t.B-, — 

most notably in those cases exhibiting 
large mu 1 t ico 1 inear i ty . The criteria 
"all VIFs less than 1" and "all signs 
correct" gave approximately the same 
results, and appeared to give better 
results than "all VIFs less than 10". 

4.2 Factor Analysis 

Factor analysis provides another 
solution to the problem of 
multicolinearity. The objective here 
is to find the underlying structure of 
the data, then generate a new set of 
variables, called "factors," to 
describe the data. Usually there are 
fewer factors than original predictor 
variables, and little or no 

multicolinearity should exist among 
them. Thus for Data Set 3, two factors 
would be generated. One would be 
equivalent to XI, while the other would 
be representative of X2 and X3. 

This technique yields results that 
may be more useful from an explanatory 
point of view than ridge regression. 
The latter may still tend to hide the 
importance of a variable. Consider 
Data Set 3. One could argue that both 
X2 and X3 are equally important, and 
either could be used for predictive 
purposes. While ridge regression 
reduced the inflated coefficient of X2 
to reflect its proper importance, it 
"essentially eliminated X3 by giving it 
a near zero coefficient. Factor 
analysis would explicitly show that 
there were two underlying factors that 
characterize the data, an/d that both X2 
and X3 were equally related to- the 
second factor. One then usually has 
two options for replacing predictor 
variables with factors. One approach 
is to use surrogate variables. Here a 
predictor variable is found that has a 
hifch correlation with the factor 
(called a "high loading"), and that 
variable is used. In Data Set 3, such 
an approach would indicate the use of 
JC 1 and X2_(or equally XI and «l_aB m .the_ 
surrogate variables . If no single 
variable is highly loaded on a factor, 
then the factor values may be 
calculated as a weighted (by the 
loadings) sum of the predictor values. 
Once one of these techniques has been 
applied, then multilinear regression is 
applied to the factors to generate the 
new model. It should be noted that 
factor analysis is applied to only the 
predictor variables; the dependent 
variable is not included in the factor 
analysis itself. 



This technique was applied to Data 
Sets I', la, and 2a. The results a,re 
shown-'in— Tab-l-e^5^^I^^~l~l^c-as-es--OTi-l7 tvx>~ 
underlying factors could be found. The 
results for Data Set la are very 
appealing . IOTIME, DISKIO , and TAPEI0 
were highly loaded on one factor, while 
everything else was moderately loaded 
on the other factor. A good approach 
might be to replace IOTIME, DISKIO, and 
TAPEI0 with Factor One, and leave the 
other variables alone. 57% of the 
variation in the predictor variables is 
accounted for by the factors for this 
data set. Similarly, the results for 
Data Set 2a indicates one factor 
represents disk I/O (which seems to. 
account for most of the I/O time) while 
the other factor represents tape I/O. 
The high correlations result in these 
factors accounting for 89% of the 
predictor variables. The results for 
Data Set 1 are more difficult to 
interpret . Factor one is a combination 
of MEMORY, TAPES, LINES, and TAPEI0, 
while factor two seems to depend 
primarily on TARRIV and CARDS. Only 
43% of the variation in the predictor 
variables is modeled by these factors. 



Table 5. Factor Analysis Results 
(Varimax Rotated Factor Matrix After 
Rotation with Kaiser Normalization) 



Data 
Set 



-U- 



2a 



Var iab 1 e 



TARRIV 
CPU 

MEMORY 

TAPES 

CARDS 

LIKES 

D! n ,XI0 

TAPiJIO 



14) T I ME 

CPU 

MEMORY 

TAPES 

LINES 

DISKIO 

TAPEI0 



IOTIME 
TAPES 
DISKIO 
TAPEI0 



Principal Components 
Factor Number 



1 

,1350 
,5659 
,6370 
,7529 
.2642 
.5887 
.3402 
.6297 



..9.55.3... 

.1737 

.0189 

.3825 

.1280 

.6140 

.8411 



,9469 
.0074 
.9876 
.2140 



.6048 
.2576 
.3271 
.07 38 
.6574 
.07 85 
.4274 
.1912 



^3-5J_. 

.6184 

.6904 

.6244 

.6591 

.0529 

.2412 



,3111 
.8883 
.0349 
.8694 
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When highly related predictor 
variables are in the model,- factor 
analysis would seem to do a better job 
of eliminating mu 1 t i co 1 ine ar i t y than 
ridge regression. However, it is a 
more involved procedure, 

4.3 Cluster Analysis 

Cluster analysis provides another 
method for attacking the problem of 
some underlying relationship between 
the supposedly independent predictor 
variables. Mu 1 t i c o 1 i near i t y implies a 
linear relationship between the 

predictor variables. Thus a 

scattergram of two highly correlated 
variables would resemble a straight 
1 ine . 

Consider a nonlinear ex am pie. 
Suppose that all jobs that require 
small amounts of memory use large 
amounts of CPU time. In addition, all 
jobs that require large amounts of 
memory require large amounts of CPU 
time. All jobs with small CPU 
requirements use an intermediate amount 
of memory. A scattergram for such a 
workload might appear as in Figure 1, 
CPU time and memory are not highly 
correlated, nor would factor analysis 
find a single underlying . factor to 
replace both of them. Yet clearly the 
two are not truely independent. 



CPU 
Time 



Me.nLQ.r_y_ 



Figure l.JNonlinear Relationship of uata 



Cluster analysis approaches this 
problem by partitioning the workload 
into clusters, or groups, of similar 
jobs, based on the workload parameters 
(i.e. the predictor variables). Then a 
separate regression model can be 
developed for each cluster. Cluster 
analysis starts with no knowledge of 
such clusters, and then iteratively 
converges to a set of natural clusters. 



When cluster analysis was applied 
to the data sets in question, natural 
clusters were found. Much of the 
workload. was discarded as falling 
between clusters. No improvement to 
the turnaround time model was found 
using the resulting clusters. This is 
primarily a result of the'fact that 
CPESIM currently generates the workload 
parameters independently, and does not 
provide for producing related 

parameters. However, previous studies 
[3] have indicated a usefulness of 
cluster analysis to this problem.. In 
addition, cluster analysis has been 
well accepted by the CPE community for 
workload analysis and generation of 
simulation inputs [2]. 

4.4 Discriminant Analysis 

Once it is known that jobs can be 
catagorized into discrete groups, 
discriminant analy.sis provides a 
technique for classifying them. The 
groups must be known in advance. 
Discriminant analysis determines a 
discriminant function, based on 

empirical data, that can then be used 
to determine the proper group for new 
jobs . Some possible applications to CPE 
data are such things as (1) to which 
user organization does each job 
belong?; .(2) to which cluster does a 
job belong (but this can be done using 
cluster analysis)?: or (3) to which 
class of turnaround times will a job 
belong? This latter question might be 
useful in predicting turnaround times. 
A problem is that discriminant analysis 
works best when there are clearly 
defined, distinct groups. Turna round 
time Lends to be a continuously 
distributed viriable. Thus arbitrarily 
partitioning turnaround time will tend 
to generate groups that are not well, 
separated. 

This technique was applied to Data 
/Se-t 1. For the first, test, turnaround 
time was partitioned into three 
classes; jobs with turnaround times of 

1 es *} than 0.5 hour ; those _wit-b 

~^mrffa^inni~ Times between 0.5 and 1.5 
hours; and those, jobs with turnaround 
times greater than 1.5 hours. Half of 
the 787 cases were randomly selected 
and used to generate the discriminant 
function, the o-t-h er half were used to 
test it. Table-6 shows the resulting 
three group centroids . Table 7 shows 
the resulting classification errors. 

In an attempt to find more clearly 
separable groups, the data was divided 
into two groups at a turnaround Lime of 
3.2 hours. These represented ''regular" 
jobs, and slow jobs. However, only nine 
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Table 6. Group Centroids 



Table 8. Discriminant Function 
Coef f icients 



Group 
Number 



Discriminant 
Function 



1 .226 
-0.191 
-1 .086 



Group Discriminant 
Function 



Normal -0 .02472 
Outliers 2 .13691 



Table 7. C 1 as s if ica.t ion Mat r ic es for 
Discriminant Analysis Example, 3 Groups 



Table 9. Class if ication Matr ices for 
Discriminant Analysis Example, 7 Groups 



Results for Cases Used for Analysis 



Classification Results 



Actual 
Group 

/l 











Actual 


Number 


Predicted Group 


Number 


Predicted Group 


Grou p 


o f Cas e s 


Norma 1 


Out 1 ler s 


of Cases 


1 


2 


3 
















Normal 


778 


665 


113 


107 


85 


18 


4 






85.5% 


14.5% 




7 9.4Z 


16. 8Z 


3.7Z 














84 


54 


Out 1 iers 


9 


1 


8 


186 


48 






88.9% 




25. 8Z 


45. 2Z 


29 .0Z 
















58 


85.51% of 


all cases 


c 1 a a f> .5 1 i 


*. a c o r r e c c 1 y 


88 


5 


25 










5.7% 


28. 4Z 


65.9% 











59.58% grouped correctly. 
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Results for Cases Not Used for Analysis 



Actual 
Group 



Numb er 
of Cases 

121 



192 



Predicted Group 
1 2 3 



95 




23 


3 




78.5% 


19 


.0% 


2. 


5Z 


53 




7 9 


$0 




2 7.6% 


41 


.1% 


31 . 


\X 


4 




31 


58 




4.3% 


33 


.3% 


62 . 


4Z 



r> 3 



57.14% grouped correctly. 

i o 1f »- - f e Tl—i f)to-*h nrl-o v fi r o u p , 1 e * v * 
778 cases in the regular group. <he 
two group centroids are shown in Table 
8, while the classification errors are 
indicated in Table 9. Note tb*t 113 of 
the regular iobs were misc lassif icid as 
slow jobs -"considerably more than tne 
nine actua ; ones. 

Discriminant analysis is not a 
useful tool unless the data is Known to 
fit into well defined, distinct 
classes. The CPESIH data did not 
provide a Rood example of such an 
environment . 



4.5 Canonical Correlation Analysis 
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corrtUt^n nr.* lysis is 
rsdi?'.: tng the 
IT relates two 
contains several 
Similar to 
which models a 



Canonical 
a technique for 
dimensionality of dat* 
sets , each of whici 
related variables. - 
regression analysis, 
single dependent variable as a function 
of a set of i>zedicitor variables, this 
technique allows a sot of related 
dependent variables to be modeled as a 
function of a set of Predictor 
variables. Note that each set should 
consist of var iab* es that 
related • 



are somehow 



The approach is to define two 
canonical variates, X* and Y*, where 



and 



X* 



Y* 



AiXi 



BiYi 



(1) 



(2) 



where the Xi represent the 
or predictor, variables 
represent the 
These two 
simultaneously 



j ndependent , 
and the Yi 
dependent variables . 
models are solved 
for the "coefficients 
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subject to maximizing the correlation 
between X* and Y*. The coefficients 
indicate the , contribution of each 
variable to the corresponding canonical 
variate. Canonical loadings measure 
the correlation of each variable with 
its canonical variate. Redundancy is 
defined as a measure of the amount of 
variaton in the predictor set that is 
explained by the dependent set, similar 
to the r-squared value in regression. 

This technique was applied to Data 
•Sets la and 2a. Turnaround time and 
IOTIME were included in the .dependent 
set as related variables, and .the 
remaining predictor variables were 
included in the other set. The results 
for Data Set 1 are shown in Table 10. 
For the y variate, note the high 
loading of IOTIME for one solution and 
for turnaround time for the other 
solution. This implies that these two 
dependent variables are not really 
related. Note also the loadings of the 
X variates. For this case, two 
independent regression models would 
have been more appropriate. Table 11 
presents the results for Data Set 2a. 
These results are more useful. Note 
that both dependent variables are 
highly loaded on the canonical variate 
in the first solution. Notice the high 
redundancy. The X loadings indicate 
the importance of the individual 
predictor variable* in determining the 
dependent set, which might be 
characterized as "job run time" in this 
case. 



Thus canonical cor i e la ton has 
limited usefulness in CP's. Th«" primary 
problem with its application is 
defining a set of related dependent 
variables of interest. 



4.6 Automatic Interaction Detection (AID) 

Automatic interaction detection 
(AID) is another data partitioning 
8 c heme , but with a different objective. 
Its purpose is to detect the presence 
of nonadd i t iv ity , or interaction 
between variables. Ths data is eplit 
into two groups baseil on one of the 
predictor variables In order to. form 
the two groups with the I^ast 
within-group variation of . the dsyoaa en t 
variable values an;^ . the* maximum 
betweea«group8 vari^tiwti. Ti;*:t each 
group is further split and so forth, 
creating a tree. if the resuttic^ tree 
is symmetrical, then no interaction 
between variables is suopacted. A 
nonsymmetric tree indicates"" that 
interaction exists. Such nonadditivity 
can be corrected for (in a predictive 
sense) by adding product teivas to the 
regression model. 

This t echnicf vas applied to Data 
Sets 1 and 2. Figure., 2 shows the 
resulting tree for Data Set 2. Since 
this was the simple data set, 
intuitively no interaction was 
expected, and the symmetric tree bears 
that out. Figure 3 is the- tree for 
Data Set 1 , H*are assymetry is cleary 



Table 10* Results of Canonical Correlation Analysis 

(Data Set la) 



Var la t e 



CANVAR1 
Canon ica 1 



CANVAR2 
Canon ica 1 



x-var iab 1 es 


Coef f 


Load ing 


Coef f 


Load ing 


Total 


TARRIV 


0.0070 


-0.0353 


0.4777 


i 

0.4997 




CPU ' 


0.0105 


0.2998 


0.0845 


0.2573 




MEMORY. . 


0.0068 


0.2334 


0.3789 


0.5366 




DISK 10 


0 . 2157" 


073842 — 


-O-.0695™ 


— 0t0415 




TAPEI0 


- 0.9290 


0.9755 


-0.4147 


-0 .0224 




TAPES 


-C .0019 


0.4612 


0.7743 


0.6981 




CARDS 


0.0061 


0.2566 


-0.0179 


0.0350 




LINES 


0.0203 


0.2743 


-0.0662 


0.1526 




Redundanc y 




0 . 1972. 




0.0798 


0.2770 



x-variab les 



TURN 
IOTIME 



0.0026 
0.9990 



0.3713 
1.0000 



1 .0760 
-0.3995 



0.9285 
-0.0024 



Redundancy 



0.5615 



0.2462 



0.8077 
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(Data Set 2a) 



Var late 

x-var iab 1 es 

TARRIV 
CPU 

MEMORY 

DISKIO 

TAPEIO 

TAPES 

CARDS 

LINES 

Redundancy 



CAN VAR 1 



CANVAR2 
Canon ical 



Co e ft 




Coe £ f 


Loading 


-0.0004 


0.0126 


-0.0077 


-0.0949 


0.0040 


-0.0104 


-0.6379 


-0.6499 


-0.0034 


0.0054 


-0.0.3 5 0 


-0.0454 


0.8702 


0.9183 


0.0474 


0.0154 


0.3932 


0.4988 


0.0752 


0.0675 


-0.0011 


0.2606 


-0.0158 


0.0054 


0.0112 


0.0788 


-0.3000 


-0.3323 


0.0490 


0.0863 


-0.6908 


-0.6916 




0.1453 




0.1057 



Total 



0.2510 



x— va r iab 1 es 

TURN 
I0TIME 

Redundancy 



-0 .0369 
1 .0297 



^evident, indicating that interaction is 
present. Adding the two product terms 
(MEMORY )x (TARRIV) and ( DRIVES )x ( TARRIV ) 
improved the r-squared value from 0.424 
to 0.505. These terms make sense in 
light of the fact that other techniques 
have ::uuicated a problem due to tape 
drives and accesses, and as the system 
starts to backlog, jobs with a later 
arrival time will tend to be impacted 
even more. 

Thus automatic interaction 

detection does seem to be a good tool 
for the detection of interaction 
(nonadditivity) between variables in 
CPE modeling. 

5 . Cone lus ions 



This study 
application of seven 
empirical modeling of 

i n a_„CPE __e n v i r q nm e n t 

r egr es s ion analysis 
o n \ 



0.7S69 
0.9998 

0.80 9 5 



considered the 
techniques to the 
turnaround time 
Multiple linear 
was used a s a - 
baseline on\ four sets of data. Then 
each of th<4 other techniques was 
applied to >ee if the model could be 
improved. It was /found that ridge 
regression was a useful tool in 
removing the distortion of regression 
coefficients due to mul t ico 1 ine ar i ty , 
but that it tended to give the proper 
coefficient for\ one variable at the 
expense of the\ related variable. 
Factor analysis, on the other hand 
took a more basic approach by finding a 
new set of variables, or ~*a c-t o.r_s , to 
characterize the workload. The 



•1.7040 
1.3582 



-0.6041 
-0.0217 

0.1504 



0.9599 



relation»of multicolinear variables to 
these factors could be clearly seen, 
and multicolinearity was removed from 
the model when the factors were used in 
place of the original -predictor 
variables. Cluster analysis extended 
this solution to cases where a 
nonlinear relationship exists J>etween 
predictor variables by partitioning the 
workload into sets of s imilar j pbs , so 
that when modeled separately 

multico linear ity was not a problem. 
Discriminant analysis appears to have 
limited use in CPE, providing- a 
technique for classifying jobs into 
previously defined, clearly distinct 
Similarly, canonical 
analysis seems to be useful 
a group of dependent 
variables can by defined to 
as a "function of the 
predictor variables. Finally, Automatic 
Interaction Detection (AID) was found 
to be a useful tool for the detection 
of nonadditivity, or interaction, 
b e tw e e n v a r i a b~l e s v T . - ~ r ~ ..».-.. 

Current efforts are under way to 
test these techniques further. Data is 
being generated to allow explicit 
demonstration of the /apabil ities . o£: 
cluster analysis, and .to demo ns 
the ability to use several of these 
techniques jointly to improve a given 
cecnniquc J - addition to these 

regression model.. in auuj.t.j.w« ■ . 
tests with simulated data, an effort is 
under way to apply these 

data measured on an actual online 
system. 



groups • 
correlation 
only when 
performance 
be modeled 
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Figure 2, Augmented AID Skeleton Tree for Data Set 2a 
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Figure 3. Augmented AID Skeleton Tree for Data Set 1 
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IMPROVING THE ACCURACY OF A WORKING-SET-ORIENTED 
GENERATIVE MODEL OF PROGRAM BEHAVIOR 1 
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An experiment based on trace-driven simulation is carried out. to study various 
improvement strategies for a working-set-oriented generative model. Working 
set size strings extracted from a real program trace are used as inputs o the 
ecnerative model. The memory demand and page fault rate of these artificial 
memory reference strings are compared with those of the original real reference 
strine under the working set policy. The same strings are also tested under two 
different memory management policies : the page fault frequency policy and the 
least recently used policy. Artificial strings generated with proper strategies 
behave quite well under both the working set and the page fault frequency poli- 
cies. However, they behave U*s than satisfactorily under the least recently used 
policy. 

Key words: Generative model; program behavior; working set policy; workload 
characterization. 



1. Introduction 

In the study of program behavior, memory 
referencing patterns have long been a subject of 
interest. On the one hand, various memory manage- 
ment policies have been proposed and implemented to 
optimize the utilization of system, resources and 
increase system performance. On the other hand, 
techniques that enable compilers to generate better- 
behaved code in order to achieve the same purpose 
have been studied. Various methods for restructuring 
programs so as to improve their referencing behavior 
have also been investigated [Ferr76a]. 

Very often, in these studies, program reference 

^he research reported here fiaj been supported by the National Science 
Foundation under grant MCS80-12000. 



strings are used to compare the performances of vari- 
ous . memory management policies or to validate 
models of program behavior. These strings are usu- 
ally collected by interpretively executing real pro- 
grams on an existing system, but this process is rather 
tedious and generally very expensive. The use of 
artificial strings instead of real strings has some clear 
advantages if the former reflect the behavior of the 
latter. Artificial strings are usually produced by gen- 
erative models of program-behawoL-[Sagc73.a],.Jhat 
ideally should use relatively few parameters and be 
based on relatively fast algorithms. 

It is usually not necessary to reproduce a 
program's behavior exactly; consequently, the genera- 
tive model will try to reproduce those properties of a 
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real slxuig which are deemed to be important in the 
context in which. the artificial string is \to be used. In 
this study, the memory demand and the page fault 
rate have been chosen as the important aspects of a 
real string to be reproduced. Our primary context, is 
that of the working set policy [DennfiSa], but we. shall 
also be comparing a real and several artificial strings 
under the page fault frequency policy [Chu76a] and 
the least recently used policy [MattTOaJ. For conveni- 
ence, these politick shall be referred to as W.S, PFF, 
and LRU respectively. 

The generative model evaluated here, which is 
based on/a working set characterization and repro- 
duces giv<?n working set size dynamics, was proposed 
by Ferrari [FerrSla] and first implemented by Dutt 
[DuttHla]. This paper deals with various improve- 
ment strategies that can be applied to this working- 
set-oriented generative model. 

The generative model is described in the next 
section. The proposed improvement strategies for the^ 
model and the design of the experiments are detailed 
in Sections 3 and 4, respectivefy. The results and 
their analysis are presented in Section 5. Finally, in 
the last section, conclusions are drawn from the out- 
come of the experiment, and directions for future 
research are provided. 

2. Background 

The goal of a generative Model is to provide a 
sequence of page references whose characteristics of 
interest are similar to those of the modeled program. 
The generative model evaluated here is based on the 
sequence (or on sequences) of working set sizes and 
possibly on other parameters. While it is desirable to 
reproduce the dynamic behavior of the modeled pro- 
gram as accurately as possible, it is o!so important to 
minimize the number of parameters needed for the 
generative model. A sequence of working set sizes is 
generally redundant in its information content, as it 
can be completely specified by the sequence of the 
pairs (/,•,!/>,), where t; is the time at which the work- 
ing set size (wss) curve changes its slope, and W; js 
the wss at that time. This sequence is the input to 
the generative model used in the experiment. 

When a sequence of wss's is given, the window 
size T used to define the working sets must always be 
specified. The reference string generated by the 
model on the basis of the given wss curve does not 
coincide with the original real string (if there is one). 
However, with appropriate generation algorithms, the 
generated string might be made to possess the essen- 
tial characteristics of the modeled one. In principle, 
all possible reference strings of length n form *a vector 
space S n where S is the page set; \]y specifying a 
particular sequence of wss's, we focus on the subspace 
of this vector space whose elements all have the given 
wss characterization. 



The input to the generative model is the given : 
wss sequence, which is defined for a given window size 
T. The wss sequence corresponding to the artificial i ; 
string produced by the model with a different' window H 
size, will generally be different from that of the i\ 
modeled program undejr the same conditions. Per/or- H 
mance indices of other kinds are usually different also T : H 

•The string generation algorithms used in this 
study can be classified into two general categories : M 
the single T generation algorithms and the double T, ] ; ; 
algorithms. By 'single T generation' we/refe^ to the j H 
generation of an artificial string basell on 6ne wss 
sequence, corresponding to one vjdtfcof T. ByVdouble ! : 
T generation' we referio^the generation \ of an f I 
artificial string based^oiftwo v/ss sequences. By speci- 
fying two wss setjucuces with two different window M 
sizes, the subset of reference strings that can be gen- 
eratecj^is further constrained. With the use of 
appropriate generation strategies, the artificial wing. 
is expected to represent thti modeled program's ; 
behavior under various mernory- management policies 
more accurately than tjie string generated by a single 
T wss characterizatiotL 

The properties ofl working set size strings and the i 
feasibility of both single and double T generation 
algorithms are treatei^:;tcnsively in [FerrSlb], 
[Ferr82a], and [Lec82bJ. ~ ; * i 

3. Improvement Strategies 

The original generative model incorporated a 
very simple strategy, since its main purpose was to 
gain some insight into the behavior of artificial refer- i ! 
ence strings [Dutt81a|. The artificial reference string ■ \ 
was generated with a single working set size string 
extracted from a real program reference string. In the 
strategy, three queues of pages are maintained : the ; 
external queue (E), the candidate queue (C),Jijid th^ ■ j|l 
forbidden queue (F). Initially, al^ pages are iiT the x \ 
external queue. Whenever the workings-set size ; : 
increases, a new page is chosen from the external N: 
queue and used as the page to be referenced ne*t. | 
When the working set size does not increase, the next 
reference is chosen from the candidate queue. Now, if • 
the working set size does not decrease T units of time / 
later, the chosen page is put into the candidate queue, • 
since it is to be re-referenced within the next working 
set window. Otherwise, the page is put into the for- 
bidd en queue, whose members cannot be referenced. 
Furthermore, if the working set size decreases, the 
page in the forbidden queue that was referenced T 
time units earlier is moved to the external queue for : 
future use. The candidate queue is managed by a ; 
FIFO policy for simplicity, and the external queue by 
a LIFO policy for economy. ! 

With these simple data structures and strategies, 
the artificial string generated is fairly accurate under : : 



tin- WS policy with a window si/.r larger than or equal 
to the our used iu its gr:ie;..tio;i. However, its 
Miavior under both the FVF and LUT policies is 
n n acc i^p t a I 1 r [ I ) u j j 81 a[^ t'nderthe I TV policy, i are 
enough page; manage to be in memory, no "furTTnr* 
pagr fault ran ever occur, i ;.der the IMV policy, the 
.•yelic referencing beha\ior due U> tdie FIFO manage- 
ment of the candidate queue creates a !ar;e numher of 
page faults. IVtsrd on these observations, three gen- 
eral improvement t rat r^tes were propped. 
(1) I'se Tut' Working Set Size Strings > ' 

The idea behind this approach is that forcing the 
artificial string to follow two dyiiarnie w.s.-. characteri- 
zations \s it h two reasonably spaced working sit win- 
— .i TrW _. M£+ « — may. ttuurce. it^ to. r^pr oduci'^ihe ^orjRinaL, ._ 
•tririK's brh:ni*»r more accurately. For example, the 
artificial string generated with this strategy may not 
b»' :ir- ^unlive to a change of window size. 

(•J) I'm' All I'ages 

* All distinct pages that are referenced in the real 
program reference string will be put in .the external 
qu»»ue and used in some fashion. Instead of managing 
■the external quruc by a FIFO policy, new pages are 
Humeri according tn one of two criteria. The first is to 
,vj.« the pages in the external queue in FIFO order. 

■ Intuitively, each page will be referenced in the 
artificial string whtn its turn cum*. The second is to 
u^e the pages in the external <rucue in such a way as 
to match the real program profile, i.e.. the relative 
nfcren.ing frequency of each page. The. rationale 
behind the latter approach is to account, at least to 
vnue extent, for the identity of the referenced pages; 
this attempt at reproducing referencing frequencies 
may improve the behavior of the artificial reference 
string under various memory management policies. 

jaj H* u .e Previously R»»fcreiir"dJ\age 

The cyclic referencing pattern^ of the original 
.Ntrategv causes severe problems with the LRU policy. 
~ Program referencing behaviors generally exhibit a 
high degreevf locality . " In particular, there is a jiigh 

■ probability that the current page coincide* with the 
paiv JUst refere nced * 

4. Design of the Experiments 

4.1 . Parameters of the model 

The parameters of the W>de! are «;.tim:.t«'d from 
x real trace. This trace was obtained from the inter- 
pretive execution of an APF program on an II JM 
3M>/wl machine Kxeopt for the w ss vhanrcteriza- 
ti...ii>. which were obtained fr-m fh" first 550,000 
references, all parameters were derived for the first 
500.000 references. Various performance indices of 
the real trace, later used for comparisons, were rtbo 
gathered hy trace-driven simulations from the same 
500.000 references 



Three wss characterizations were obtained, with 
window sizes of 5000, 10000, and 2QC00 references 
respectively. For single T generation, the wss charac- 
terization with window size 10000 was used, whereas 
tlre-rHher-t wo- were- w^^^ 

mentioned in Section 2. a wss characterization is a 
sequence of [t,w] pairs, where t is a time at whicn the 
slope of the wss curve changes, and w is the value of 
the wss at that time. The numbers of pairs needed by 
the three . characterizations of the APL ...pro.gra.ni_ trace 
are 1157, biO, iand 525 for window sizes of 5000, 
10(X)0, and 'J0000, respectively. The nominal window 
size of 10000 wxs chosen for two reasons. First, this 
window is not so short as to obliterate the program's 
phase transition behavior [I)utt31a) r«nd not so long as 
tn re q uire too miich _mtjngry space. Secondly, in the 
neighborhood of window size "TOOOO, the s$>nc(ytimc 
product curve shows rather stable and relatively low 
values.. - t i * 

The coefficient of resilience is defined hero as 
the probability that , the page referenced next is the 
same as the currently referenced one. In essence, this 
is the probability of referencing the top of the stack in 
an LIU; environment (it is often called d, in the stack 
distance probability distribution [Spir77a|J. The value 
of the estimate of this parameter from the real trace is 
O..1U1 

' The number of distinct pages in the first 500,000 
references of the real trace was found to be 110. Not 
yirprisingly, the most frequently referenced page 
accounts for 25 percent of the references; also, 'JO 

percent of the pages account for 8t> percent of the 
references. 

4.2. Performance Indices 

Various performance indices were chosen for com- 
paring real and artificial strings, To compute the 
space-time product, the page wait time was assumed 
to be constant and equal to 10000 references. The 
primary performance indices considered in the various 
contexts are listed below, 
( 1) \VS environment 

Mean working set size, page fault rate, space- 
* time product, working set' size distribution, and inte> 
fault time distribution are the primary indices we are 
interested in when the WS polity is used. 
(•>) PFF environment 

..-Jean working set. size, page fault rate, space- 
time product i working set size distribution, and inter- 
fault limtN^stribution are the primary ^indices of con- 
cern in the NiF case. The parameter / of the PFF 
algorithm, i.e., the threshold of interfault times, was 
chosen to equal 1513 references. This is the value ..if 
the mean interfault time obtained under the WS pol- 
icy with a window size equal to 10000 references. 
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(3) \AIV environment 

Page fault rati*, space-time product, intcrfault 
time distribution, and stack distance probability dis- 
tribution arc the primary indices of concern in the 
LJliL Yw tn<4 slnvk distance probability distri- 
bution, the probability dj of referencing the top of the 
Mack is particularly important. The parameter of the 
LHT policy, i.e., the fixed partition size, was chosen 
to be 21 page frames. This is the mean working set 
size with a window size of 10000. Under the assump- 
tions mailt 1 by Denning and Schwartz [Denn72a], this 
choice for LRU should produce the same page fault 
rate a* the \VS policy with window size 10000. 

4.3. Overview of the Experiments 

, 4.3.1. Artificial Strings 

Twelve artificial strings, each 500,000 references 
long, were generated and compared with the real 
string. The names of the artificial strings reflect the 
three control variables of the experiment : the number 
of wss characterizations, the way to select a new page 
when one is needed, and the method for re-using 
pages already in the working set. 

A string name consists of three characters XYZ, 
where X is $ in the case of single T generation, and D 
in the case of double T generation; the binary variable 
Y is 0 if the old pages are re-used in FIFO order (i.e., 
pages are selected from the candidate queue in the 
order in which they were put in), and is 1 if the pro- 
bability of referencing the previously referenced page 
rather than the first in the candidate quiuc is taken 
into account (in this case, the number of consecutive 
references to the same page is geometrically distri- 
buted); finally, the variable Z is 0 if new pages are 
selected from the external queue in LIFO order, 1 if 
new pages are selected from the external queue in 
FIFO order (in this case, the external page queue ini- 
tially contains a nifmber of pages equal to the the 
total number of distinct pages referenced in the real 
string), and 2 if new pages are selected from the exter- 
nal queue according* to a given relative frequency dis- 
tribution of page references (in this case, the usage 
record of each page is continuously updated so that 
the appropriate page can be chosen to match the 
givm frequency distribution). 

4.3.2. Data structures 

To generate t he six strings named S**, the sim- 
ple 3-qu»:ie data structure described in Section 3 is 
used. 

To generate the six strings named D**. we could 
operate with two sets of queues, each corresponding to 
one wss characterization. J one set will consist of C,, 
/♦',/ and and the other of C 2 , F 2 , and /? 2 . Each 
page would be at any given time in one (and only one) 
of the three queues in each set. However, if these six 



queues were implemented as described, it would be 
necessary to calculate a large number of intersections 
of two queues, one from each set. To speed up the 
generation algorithm, a data structure consisting of 
five doubly-linked lists (queues) wis implemented. 
Each queue in the structure corresponds to a particu- 
lar intersection of the two queues mentioned ;.jovc. 
Fortunately, not all possible intersections of the six- 
original queue/* are needed in the generation of refer- 
ence strings. The five (intersection) queues required 
are C'jC'o, F\E 2 , -'i^2> ^'i^2> nn< l ^V's- 

All. pages arc initially in the E X E 2 queue. When 
in both wss characterizations the wss increases, a new 
page is selected from the E X E 2 queue according to the 
strategy specified by variable Z. When in both avss 
characterizations the wss does not increase, a page is 
selected from the C X C 2 queue. When the wss with 
the larger T doer not have to be increased and the 
other needs to be. increased, a page is selected from 
the C X E 2 queue. 

After the page to be referenced next is chosen, 
the queue to which this page rhould be added is to be 
selected. When the page has to be re-referenced in 
order to remain in both working sets later, the page is 
put into the C X C 2 queue. When the page has to drop 
out of both working sets later, and cannot therefore 
be re-referenced until then, it is put into the.F|F 2 
, queue. When the page has to remain in the working 
set with the larger T, but has to drop out of the 
working set with the smaller T, tlu page is put into 
the C x Fo queue. \ 

A page stays in the /'^oNjueuc until it drops out 
of both working sets, at which time the page is 
released and moved to the E X E 2 queue. Similarly, a 
page stays in the C X F 2 queue until it drops out of the 
working set with the smaller T, at which time the 
page is released and sent to the C X E 2 queue. No 
other transitions of pages between queues arc possible. 
It should be noted that this arrangement of the data 
structures also maintains the chronological ordering of 
arrival of the pages in each queue. Therefore, no 
search is needed to select and delete a page when the 
FIFO strategy is used. 

5. Experimental Results and Their Analysis 

The generation of 500,000 references with a single 
window size took from 275 seconds to 121 seconds of 
VAX-ll/780 CPU time depending on the options 
chosen. The generation of 500,000 references with 
two window sizes took from 300 seconds to 4GG 
seconds 'of VAX-ll/780 CPU time depending on the 
options chosen. The remarkable efficiency of the dou- 
ble T generation algorithm is due to, the carefully 
planned structure of the queues, which eliminates the 
need to do linear searches on them in most cases. An 
optimized version of the generation program, without 
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statistics fathering, is Expected to tan twice as fast as 
I hi- versini we have used to gathcl such voluminous 
us page reference distribution, 
strings were run under^various memory 
»nt policies, and their performance indices 
pared with those produced under the same 
{ the real string. These comparisons are dis- 
v v, Sl . the following subsections. Notice that string 
S00 is essentially the same as the string generated in 
an earlier experiment [DuttSla]. 

6.1. Cha racterization of Artificial Strings 

Two statistics are listed in Table 1 for com- 
parison : the total number of distinct pages used in 
each strilig, and the coefficient of resilience. The 
uu.nber of distinct pages used in S02, S12, D02, and 
1)1*2 would reach 110 if the string generated were 
infinitely long. However, due to the very small proba- 
bility densities at the tail of the distribution, only a 
fraction of all pages are actually used when generating 
500,000 references. 



Table 1. Characteristics of the Strincs 



string 


number of 


coefficient of 


distinct pages 


resilience 


SOO 


56 


0.000 


S01 


110 


0.000 


S02 


80 


0.000 


S10 


55 


0.544 


Sll 


110 


0.544 


S12 


80 


0.544 


Real 


no 


0.544 


D00 


78 


o.poo 


1)01 


110 


0.Q00 


D02 


80 


o.obo 


1)10 


78 . 


0.544 


1)11 


110 


0.54-^ 


1)12 


80 


0.54t\ 



5.2; WS 

Artifi 
policy w 
with one 



it li 



policy results 

iai strings were executed under the WS 
p . window size 10000 ir they were generated 
r, and also with other window sizes if they 



were generated with two T's. Their performance 
indices were found to be exactly the same as tho^e of 
the \-eal string executed under the same window 
size(s). This was expected, but strengthened our 
confidence in the correctness of the generation pro- 
grams. The results for the real string under the \VS 
policy with several different window sizes are given in 
Table 2. 



Table 2. WS Results for the Real String 





v in (low 
size 


mean 
wss 


max 

wss 


changes of 
slope 


space time 
product 


page fault 
rate 


max 
interfault time 


\ 


20000 
15000 
10000 
7500 
5000 


20.17 
23.67 
20.00 
10.20 
16.90 


78 
67 
56 
51 
45 


525 
554 
610 
742 
1157 


1.10K8 
J.07E8 
1.06E8 
1.1 4E8 
1.20K8 


0.000562 
0.000502 
0.000648 
0.000770 
0.00118 


111605 
111605 
1 11695 

,65202 
32092 








Tabic 3. WS Policy (T= 


=7500) 






string 


mean 
wss 


max 
wss 


changes of 
slope 


sj».ace time 
product 


page fault 
rat;c 


max 
interfault time 




I)** 
Heal 


10.38 
19.29 


51 
51 


772 
742 


1.18E8 
l.MES 


0.000800 
0.000770 


80577 
65292 








Table 4. WS Policy (T= 


=10000) 






string 


mean 

wss 


max 

wss 


changes of 
slop'e 


space time 
product 


page fault 
rate 


max 
interfault time 




1)** 

Real 


21.05 
20.00 


56 
56 


, 613 
610 


1.05E8 
1.06E8 . 


0.000042 
0.000648 


111695 
111695 








Table 5. WS Policy (T= 


=15000) 






string 


mean 
wss 


max 

wss 


changes of 
slope 


space time 
product 


page, fault 
rate 


max 
interfault time 




D** 
Real 


23.71 
2&G7 


07 
67 


532 
554 


1.05E8 
1.07E8 


0.000570 
0.000502 


1 1 1.695 
111605 
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5.4. LRU policy results 

The performances of all artificial strings under 
I AW dilFer significantly from that of the real string, 
liven making various niodilications to the original 
simple generation algorithm, accuracies are .still unsa- 
tisfactory. During the simulation, also the LRU stack\ 
distance probability distribution was obtained. The 
probabilities <\ { of referencing the top of the stack are V 
reported in Table 7. One reason* for the inaccuracy of 
the artificial strings is that the value of the parameter 
rn, the number of page frames allocated to the pro- 
gram in the LRU experiments, was not appropriate. 
The page fault rate produced by the real program 
trace with a memory allocation of 21 page frames was 
more than twice that produced by the WS policy with 
window size 10000. It is clear that the APL trace 
does not satisfy all of the assumptions made by Den- 
ning and Schwartz [Denn72aj. Assuming that the 
reference string is stationary is probably unrealistic in 
this cxpcrime/nt. Similar findings were reported in the 
literature by other authors (see for example 
[Smit76a])./ 

A ncAy value for the parameter m was obtained 
by trying to match the measured page fault rate of 
the real /string under the WS policy with window size 
10,000 references. This new value of m turned out to 
be 31 page frames. The performance,' indices, obtained 
from the LRU policy with the ncwi m are given in 
Table'8. As shown in Table 0, this change resulted in 
an improvement of almost one order of magnitude in 
the accuracy of the artificial strings. However, the 
accuracy is still unsatisfactory. Working-set-oriented 
artificial strings cannot be reliably used in LRU 
en vironments 

' \ 

> mis arc summarised m Table fi. 



Table 8. PFF Policy (1=1543) 



i 

String 


mean 


max 


changes of 


space time 


page fault 


max 




wss 


wss 


slope 


product 


rate 


interfault time 


SOO 


55 50 


56 


56 


L30E7 


0.000112 


490024 


S01 


20 07 


67 


318 


1 .02K8 


0.000648 


111695 


S02 


26.16 


61 


260 


8.83E7 


0.000528 


'1 11695 


S10 


55.5 


56 


56 


4.32E7 


0.000M2 


490021 


Sll 


20.07 


67 


318 


1.02E8 


0.000643 


111695 


S12 


27.68 


61 


255 


8.89H7 


0.000516 


111695 


Real 


20.71 


67 


ill 


1. 17138 


0.000842 


80339 


1)00 


21.01 


67 


38* 


LUES 


0.000776 


101010 


1)01 


20.63 


67 


■113 


I.ISE8 


0.000848 


\r£0399 


D02 


20.93 


67 


409 


L18E8 


0.000836 


80399 


D10 


21.91 


67 


381 


1.MK8 


0.000776 


101010 


1)11 


20.03 


67 


113 


L18E8 


0.000848 


80399 


1)12 


20.63 


67 


412 


1.18153 


0.000816 


80399 
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When generating artificial strings with two win* 
dow sizes, it is of interest to investigate the accuracy 
of the characterization for Values of T between the 
two window sizes used in the generation phase.. The 
results for three intermediate values of T are summar- 
ized in Tables 3, and 4, and 5. Not only the first 
moment results but also the distributions were found 
to be very close (within 5 percent) to those of the real 
string [LeeS2a| 

As shown in the tables, the results for all 
artificial strings named \)** were found to be identi- 
cal. This observation was generalized in the following 
theorem, who.c proof is lengthy and has therefore 
been omitted here (it can 1>e found in [LeeS2aj). 

Theorem 

Strings generated fron two wss characteriza- 
tions with window sizes T 9 and T h with T,<T h 
have the same wss characterizations for any T 
sux h that T, < T< 1). 

6.3. PFF policy results 

That the accuracy of strings SOO and S10 is quite 
low under tin- ITK policy is not too surprising. This 
inaccuracy is due to the fact that, when new pages are 
taken from the external queue in LIFO order, no 
further pag" faults c? n occur after all pages/are 
brought into memory. However, if in the .striaggen- 
eration algorithm we increase tlie size r f tlic page 
population at i|p* tune a new page is selected, we can 
adequately reproduce the performance of the real 
string under the PFF policy. Also double T genera- 
tion produr s accurate artificial strings. This 
encouraging result is due in part to the similarity 
between (he WS and I'FF policies in their dynamic 
and adaptive allo.-ati* n of meii ory to programs. Tin* 



Table 7. LRU Policy (m=21) 



string 


^>age fault 


max 


space time 




rate 


mterfault time 


product 


^percent) 




0 217 


111834 


2.28F10 


0.0 




0.217 


111695 


2. 281% 10 


0.0 


sog 


0 217 


1 1 1695 


2. JolVlU 


n n 

u.u 


S10 


0.100 


1 1 1834 


1 nrj^in 
I.U01MU 


O'l.'i 


Sll 


0.100 


1 11695 


1 n r 1 ? 1 n 




SP2 


0.099 


1 1 1695 


1 .0 1b 1 u 




Real 


0.00146 


111416 


1 63E08 


54.4 


1)00 


0.130 


11 1702 x 


1.36E10 


0.0 


IX) 1 


0.130 


111563 


1.36IC10 


0.0 s 


UK)Z 


0.130 


111563 


1.361210 


0.0 


IMO 


0.0695 


111702 


6.36E09 


54.4 


DU 


0.0U06 


111563 


6.37E09 


54.4 


1)12 


0.0I>0 i 


111563 


6.32E09 


54.4 




Tab Ur 8. LRU Policy (m=31) 


riiicr 


page fault 


max 


space time 


<>, 


rate 


interfault time 


product 


(percent) 


soo 


0.00990 


175482 


1.55E9 


0.0 


SO I 


0.0102 


111695.. 


1.50E9 


0.0 


S02 


0.0101 


111698 


1.58K0 


0.0 


S10 


0.00176 


175482 


7.54158 


54.4 


Sll 


0.00502 


111695 


7.94LC8 


54.4 


S12 


o.oor,oi 


111765 


7.96E8 


54.5 


Real 


0.000670 


175392 


1.19158 


54.4 


DOO 


0.00896 


. 175392 


1.40E9 




D01 


0,00913 


111695 


1.43E9 


0.0 


1)02 


.0.00908 


111695 


1.42F9 


0.0 


1)10 


0.00139 


175392 


6.96li8 


MA 


1)11 


0.00156 


111695 


7.2.iKS 


54.4 


D12 


0.00451 


111695 


7.20158 


54.4 



Table 0. Ratio of LRU Performance Indices 





page fault rate 


space time product 


string 


(artificial/real) 


(artificial/real) 


m=21 


m=31 


m=21 


m=31 


SOO 


1 48.63 ^ 


✓ 14.78 


139.88 


13.03 


S01 


148.63 


15.22 


139.88 


13.36 


S02 


148.63 


15.07 


139.88 


13.28 


S10 


68.49 


7.10 


64.42 


6.34 


Sll 


68.49 


7.49 


64.42 


6.67 ' 


S12 


67.81 


7.52 


63.80 


6.69 


1)00 


89.04 


13.37 


83.44 


11.76 


1)01 


89.04. 


13.63 


83.44 


12.02 


D02 


89.04 


13.55 


83.44 


11.93 


D10 


41.44 


55 


39.02 


5.85 


mi 


41.51 


6.81 


39.08 


6.08 


1 D12 


41.16 


6.78 


38.77 


6.05 



6. Conclusions 

The tradeoff among the possible choices for a 
generation algorithm has two v./ects : accuracy and 
complexity. A strategy more sophisticated than the 
simplest one should definitely be selected if perfor- 
mance indices are unacccptably inaccurate without it. 
Such a strategy should also be incorporated into the 
generation algorithm if it adds relatively little to the 
complexity^ the implementation and to the cost of 
the generation process, while appreciably increasing 
the accuracies of some of the performance indices of 
interest. 

In a \VS environment, since the accuracy of the 
^ double T generative model is practically independent 
of which strategies are chosen, it is clear. that simpli- 
city is the primary coi.cern. DOO could be a reason- 
able candidate; however, D10 has a better <oeflicient 
of resilience. Single T generation is not considered 
- -litrre—because^.i the clear advantage of double J T 
generation in the sensitivity of the accuracy to the 
choice of worki .g set window size. J 

Double T generation provides acceptable resijlts 
c'.«n in the ITF policy case. It is clear that takjng 
into accc-int ti>e number of distinct pages and using 
them in FIFO or^er when a new page is needed is 
very cost-effective/. 

The results of the LRU experiments wore not 
very satisfactory, we should not expect that a 
YV54*^d approach for string generation will 
"njitoiiiatieally provide a good accuracy under the LRU 
policy, Kven in this case, double T generation with 
the minimum number of pages (1)10) is the most 
cost-effective solution among those studied in this 
paper. 

AP things considered, a double T general ion' algo- 
rithm is undoubtedly a better choice than any single 
T generation algorithm. The coefficient of resilience 
can be taken into account by the algorithm to 
improve the accuracy in a non-WS environment. At 
the same time, baring the artificial string reference 
the same number of distinct pages as th» real one is 
also very beneficial. In summary, the IHI strategy 
should be used. 

To improve the accuracy of the model in ?n LRU 
environment, incorporating into the algorithm the 
first order properties of LRU behavior may prove use- 
ful. For instance, considering more than one stack 
distance probability, .e.g., d 2 and <l y 'm the generation 
phase of the algorithm may shape the artificial string 
to be more LRU-like. -■ 

The obvious extension of the double T generation 
approach is a triple T generation algorithm [Fefc82a). 
With three reasonably spaced window size.*, the 
model's accuracy in terms of first moment res/ults as 
well as of distributions should increase. However, it Is 
not known whether the further gain in model :<ccu- 
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racy will justify the increase in the complexity of the 
generation algorithm. 
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SOFTWARE IMPROVEMENT PROGRAM 

Opal R. Stroup 

Defense Mapping Agency 
U.S. Naval Observatory 
Washington* DC 20305 

This paper sunmsrizas DMA'* approach to upgrado 1 ts SPEkRY Sclent 1 f r and 
Tochhicai softwaro whi la modernizing tho Agoncy 1 * software practice*, The obj*,- 
tivos of tho five-year software i mprovomont program aroi increased productivity, 
improved software quality, maintainability, reliability, and portability; and 
standardization of software development practices. Both current problem* and the 
program to Introduce a modern programming environment, improve ex*stins? *)ftwar*. 
and upgrade personnel skills to support the new environment are doU'led. 

Key wordst automated verification; COBOU DMA! FORTRAN* modern programming! 
programming standards! software conversion! software improvement! SHERRY 1103; 
structured programming 



1 . Introduction 

Tho Defonso Mapping Agoncy (DMA ) is currently 
in tho initial stagos of a five-year program to 
;„7-.-ado its SPERRY 1100 Scientific * Technical 
tilY) software whilo modernizing tho Agoncy*s 
softti>re production practices. Ultimate objec-\ 
tives of this Softwaro Improvemont Program (SIP) 
aret increased productivity? improved software 
quality, maintainability, reliability, and 
portability! and standardization of software 
d&volopmont practices. This paper providos an 
overview of the DMA Softwaro Improvement Program. 
B*fore* describing tho Software Improvemont 
Program, it 3 s appropriate to montlon DMA * s 
mission, product*, and organ 1 zat ion . 

2. DMA Mission and Products 

DMA* s mission is to provide Mapping, Charting 
and Geodetic (MCIO) support and services for the 
Secretary of Dofonso, the Joint Chiefs of Staff 
and military departments, and other DoD Compo- 
nents through tho production and worldwide 
distribution of maps, charts, precise positioning 
data, and digital data for strategic and tactical 
military operations an rf* weapons systems. DMA ( s 
mission also Includes carrying out statutory 
responsibi 1 1 ti os to provide nautical charts and 
marine nav Igational data forusobyU.S. vessols 
and navigators in general. 



DMA has five Components including two 

Production Centers (the Aorospace Center (AC) in 
St. Louis, MO, and tho )4y dror* »phi ~'Too^grapi>1 c 
Center (HTC) in Brookmont, Md) whic , produce the 
MCIG products and data. 

Softwaro may be considered a suhoroduct of 
the- previously mentioned products rather than an 
\end-product in itself. DMA* s software is ujed to 
produce, maintain, store, and manipulate data, tc 
dr\ve mapping and charting equipment, produce and 
validate mathematical models, generate data in 
digital format, and to. Perform other ton ct ions 
which croate DMA products. 

\ 3 . Background 

Tho DMAcomputing environment includes SPERRY 
-1100 main frames as illustrated in Figure l 1 . 
Tho SPERRY acquisition history from 1972 to the 
present as wallas tho systems which will be in 
place in 1983 are\also illustrated. In 1978, DMA 
initiated .the "Phase II Computer Replacement 
Program" to corapot^tt voly acqui re computing 
capacity to jrtpport the Agency Tn tho 1982-1990 
timo frame, by roplac<>*g the four SPERRY 1100s 
(one 1108 and one 1100/-r2\>or Cantor). As part of 
tho replacement actlvltyN. DMA and the Foderal 
Conversion Support Canter -.(FCSO* por formed a 
softwaro conversion cost analysis (using the FCSC 
cost model) for each of sovoral. acqui si tlon 
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Year 


AC 


HTC 


1972 


1105 


1108 


1977 


1108 

1100/^2 


1108 
1100/42 


1980 


1100/81 
1100/82 


1100/81 
1100/81 


1983 


1100/32 2x1 
1100/84 4x4 
li00/.'32 2x1 
1100/62 2x2 


1100/82 2x1 
1100/82 lxl 
1100/62 MP 2x1 
1100/61 lxl 



Figure 1. SPERRY ACQUISITION HISTORY 

alternatives being considered. To provide the 
be*t tr*c<*off between maximizing competition and 
avoiding the $30 to $40 million cost of convert- 
ing DMA's entire applications software inventory 
to a now target machine* DMA selected tho follow- 
ing strategyi upgrade of tho SPERRY CPUs* 
Tiftmory. card equipment and printers* competitive 
acquisition of tapes disks* and terminals* soft- 
ware redesign* competitive contracts for data 
baaa concoptual design and i mplementat i on; local 
area ..atworki ng» and technical support services* 

In granting tho Delegation of Procurement 
Authority (DPA) C 1 1 J 1 GSA suggested that the 
Agency implement a software improvement program 
to ensure that UHA will establish an environment 
which will foster competitive conditions for 
subsequent procurements* DMA had already recog- 
nized a need to improve software and the software 
development environment and had several orjrgotng 
related* independent activities, in progress. 
Included in these activities were several 
Research * Development activities in ;the area of 
software development tools. /' 

/ 

The Software Improvement Program v i s intended 
to consolidate into a single coordinated program 
many on-going* related activities which have been 
developing independently. The plan builds on 
prior Center accomplishments to avoid duplication 
of effort and io benefit from lessons learned 
during previous activities tl23. It will 
initially be implemented for the SPERRY 1100 
systems but will later be extended to the mini- 
computer environment as well. 

The" operational concept for SPERRY upgrade 
for the S*T systems calls for transitioning the 
Agency from a centralized* batch-oriented data 
processi ng envi ronment to an interactive process- 
ing environment. This includes! acquisition of 
approximately 500 interactive terminals (HTC - 
30 0* AC - 200)* exploitation of data base manage- 
ment concepts and noHjork i ngj and conversion o f 
non-standard production software to ANS standard 



languages (ANS FORTRAN X3.9* 1978 till ANS COBOL* 
X3v23* 1974). Transition from the current 
batch-oriented environment to interactive envi- 
ronment alone is a significant task. This task 
i s compl i cat ed by product i on software def i ci en- 
ciqs* lack of required skills* insufficient 
automated data processing (ADP) staff* and the 
absence of a modern programming environment. 
Several recent independent studies £4*5*6] have 
identified serious deficiencies in DMA's 
production software. These deficiencies may be 
categorized as follows: 

1. Multiple versions of production programs. 

2. Non-ANS Standard (therefore* nonportable) 
code . 

3. Obsolete coding practices resulting in soft- 1 
ware which is difficult to maintain. 

4. Logical design which is hardware-dependent 
and inefficient. 

5. Poor end-user interface, 

6. High error-off rate (much of which results 
from poor user interface). 



Most DMA software developers have received 
formal university training in disciplines other 
than computer sci ences/ software engi neeri ng 
(e.g.*. physical science* mathematics* earth 
science* cartography* geography* geodesy* photo- 
grammetry* etc.) and few have training/experience 
in designing interactive software systems. There 
is i nsuf f i event ADP manpower to perform a massive 
software redesign while simultaneously .support- 
ing normal DMA production. Finally* DMA is in 
only the initial stages of introducing those 
tools and techniques which constitute a Modern 
Programming Environment (MPE). Therefore* we 
have initiated the SIP and directed it at three 
a roast 

1 . Software upgrade. (i.e.* software . cleanup 
and software redesign). 

2. Upgrading of software development personnel 
skills. 

a 

3. Introduction of an MPE into DMA. 

4. Software Upgrade 

Improvement of the existing SPERRY 1100 S*T 
software will require three major tasks* 

1. Inventory of existing software 

2 . Cleanup, of selected oxi sti ng COBOL and 
FORTRAN software 



1 Figures \n brackets Indicate tho Uterature references at the end of tM» paper. 
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3. Redesign of selected software. 

« Centers have compiled and continue to refine 
inventories of existing SPERRY Sit software, 
Centers are identifying candidates for cleanup 
and/or redesign using such criteria asi the 
anticipated life of the software; whether and In 
what time frame the software is to be off-loaded 
from the SPERRY 1100; frequency of software use; 
and criticality of the software to the DMA 
mission. 

The "cleanup" will consist of five major 
activities! baseline definition* translation* 
restructuring* validation* and documentation. 
Center personnel and/br contractors may accom- 
plish cleanup of software. To the extent possi- 
ble* cleanup 1 s to be accomplished with automated 
tools rather than manually. The ultimate goal of 
the cleanup effort is thai: all SPERRY SIT soft- 
ware will be structured* free of vendor 
extensions* conform to ANS standards* and docu- 
mented According to DOD standards. 

Baseline definition refers to comprehensive 
testing of software with retention of results for 
future comparisons. 

Translation is def led as the conversion of 
existing COBOL or FORTRAN code to ANS COBOL 
X3.23, 1974* or ANS FORTRAN X3.9* 1978 (FORTRAN 
77) respectively* with the removal of vendor 
extensions. The removal of vendor extensions 
will in no case result In the loss of function. 
Once the existing SPERRY 1100 software has been 
converted to the ANS standard* future conversion 
to non-SPERRY mainframe or conversion to the ANS 
subset for minicomputer systems will be more 
easily accomplished. Translation to ANS Standard 
will result in more portable* 

non-vendoi — dependent code. DMA does not have a 
translator tool to accomplish this task. Manual 
translation would be extremely labor-intensive 
and error prone. Therefore* DMA plans io require 
"software redesign/cleanup contractors" to 
provide such a tool. 

Restructuring is the changing of nonstruc- 
tured code to structured code (i.e.* code 
containing only the following logic structures! 
Sequence Block* I F-TH EN-ELSE* DO UNTIL* DO WHILE* 
CASE) , Restructuring reveals the structure of an 
algorithm so that its existing code may be main- 
tained* modified* or documented. The 
rest ructuri ng process does not change program 
logic* and it does not redesign "spaghetti." 
However* the structured code generated by the 
structurlzer Is more readable than the original 
code and* therefore* is more maintainable. 

DMA owns a FORTRAN Automated Verification 
System ( FAVS ) C8,93 which provides a FORTRAN 
structurlzer as one of its major subsystems. 
FAVS is currently being made ready for production 
use via a maintenance contracjb with the tool 
develgper* General Research Corporation (GRC). 



Similarly* a COBOL Automated varfication System 
(CAVS) [21 is being developed foi DMA. A 
restructuring capability for COBOL Is an option 
which DMA may exercise. 

The validation task refers to the duplication 
of baseline testing for cleaned up software to 
ensure that the process did not introduce errors. 

The software documentation task consists of 
both automatic and manual generation of documen- 
tation. 

Duri ng software cleanup* missing or 1 nade- 
quate documentation will be augmented with both 
automat i cally and manually generated ma tori als . 
Automatic generation of reports about code such 
as static analysis (ANS standard violations* flow 
analysis errors* portability and flow metrics)} 
complete layout of. all files; detailed 
cross-reference of all statements; and a map of 
data usage in the COBOL procedure division will 
be produced as software Is cleaned up/redesigned. 
The purpose of the reports is to assist mainte- 
nance programmers in reading and analyzing code 
and in controlling the impact of program modifi- 
cation . 

The FAVS will provide the following documen- 
tation for FORTRAN code; invocation summary* 
common matrices* input/output .statements* and 
cross-reference of external variables for multi- 
ple modules; and symbol reports* cross-reference 
reports* invocation space reports* and invocation 
bands reports for indivi dual modules [81 . CAVS 
(when available) will generate the following 
reports! an indented listing of COBOL source; 
cross-reference of calling and called urograms; 
cross-reference of program and file Interaction* 
cross-references of program and. coi / text 
instruction showning where copy t«xts are used; 
cro* i-reference of program versus linkage section 
contents; reports showing where all identifiers 
are defined* set* and used* and a cross-reference 
of identifiers by record position and programs*' 
showi ng f 1 elds def 1 ned* set and used* and when 
and where identifier names differ [21. 

During the Inventory phases* Centers Identi- 
fied documentation available for application 
.programs. Missing user documentation \ A % to be 
written ("manually") during the redesign effort. 
In cases where contractors perform software rede- 
si gn* th*~ contractor wi 11 prepare such mi ssing 
documents . In cases where requi red 1 mprovement 
includes ^nly" enhancing/preparing documenta- 
tion* In-house effort will be used. 

To allow *the Centers to solve a variety of 
software problems* redesign in broadly defined as 
any appropriate combination of the following! 

1. Rewrite all or portions of the code for 
1 nteract 1 vl ty . 

2. Redesign of the user interface (leave the 
cede untouched while creating a new "front 
end" to improve user interface). 
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3. Optimization of those portions of code which 
consume the greatest, resources (based on 
results of instrumentation and use of 
metrics). 

4. Rewrite portions of the code to increase 
rel i abi li ty . 

5. "Scrap" the existing program and rewrite the 
algori thm using the "structured code" as a 
basis for understanding. 



5 . Approach < 

1 he DMA approach to the software improvement 
effort is to' contract for the redesi gn/ cleanup of 
selected software. DMA intends to award Basic 
Ordering Agreements (BOAs) to all contractors 
posses si ng cert a i n corporate ex peri once* person- 
nel experience* and software tool 
access/experience. A Request for Proposals (RFP) 
was .fssued in 1982 for Software Resdesi gn/cleanup 
and proposal evaluation is in progress. On a 
case-by-case basis* DMA will issue delivery 
orders describing the desiered redesign. A firm 
fixed-price contract will be awarded for each 
delivery order on the basis of technical approach 
and cost. The contractor may be tasked with 
cleanup only or cleanup, plus desired redesign 
activities. The contractor is being required to 
use automated tools to translated code to ANS 
Standard* and restructure it. New code i s to be 
rewritten using only structured programming 
contructs. In some cases* the contractor will 
redesign software that has been cle'aned-up 
in-house. In others* . unstructured* Untranslated 
code will be the contractor's input — especially 
during the early stages of the effort. Since DMA 
is concurrently attempting to introduce several 
tools into the sofware development environment * 
(e.g FORTRAN precompiler)* contractors will 
be required to interface with them. For example* 
a contractor writing structured FORTRAN would be 
required to use those constructs and delimiters 
acceptable to the DMATRAN precompiler. 

6. MPE Implementation 

The improvement (cleanup and redesign) of DMA 
software will consume a significant amount of 
resources over a period of five years. To obtain 
maximum return from this investment* DMA must 
take actions to ensure that 

modification/maintenance of the improved soft*** 
ware does not result in. the introduction of the 
deficiences di scussed above. Moreover* all new 
software developed must be of the same (or 
higher) quality as the improved software. There- 
fore* DMA ^s concurrently introducing a 5PERRY 
1100 Modern Programming Environment (MPE); The 
MPE will include a centralized Production Program 
Library which will be tho repository for 
production programs and documentation (in human 
and machi ne readable form) . As each software 
system is cleaned up and/or redesigned* it will 
be migrated into the production program library 



at the appropriate Center and will be placed 
under control of the newly formed Configuration 
Control Board ( CCB ) . 

/- 

Tools to support an MPE implementation plan 
may be grouped into three general cateoyriest 
(1) conversion aids for software* (2) management: 
aids for existing software* and (3) productivity 
assistance tools. Three categories of conversion 
aids are considered in this plant static analyz- 
ers* pr ecompi lers* and structuri ng engi nes. The 
FORTRAN Automated Verification System (FAVS) 
provides each of these tools. FAVS deficiences 
are currently being corrected so that the tool 
can be introduced for production use. Similarly* 
the CAVS wi 11 be made avai lable for production 
use once it is production-ready, A COBOL precoma 
pi lor whi ch will i nt erf ace wi th CAVS is also 
available. The term precompiler is used here to 
refer to a tool which simplifies the task of 
wri ting structured code in such languages as 
FORTRAN and COBOL which do not support all of the 
structured figures. 

Two classes of tools to manage existing soft- 
ware (whether developed in-house or by contrac- 
tors) are being considered by DMA. 

The first class of such tools is the code 
auditor to automatically check for adherence to 
Center standards [10] for structuring and ANS 
standards. Use of such a tool would allow DMA to 
avo id the more labor- i n ten si ve* and error- prone 
manual methods . However* acqui si t ion of a tool 
is not anticipated prior to introduction of stan- 
dards. 

The second class of tool s bei ng con si dered 
are configuration control tools which automat- 
ically, track changes to software and permit only 
authorized changes to an official version of 
software. DMA plans to i nvest i gate acqui si t i on 
of such a tool %o support the CCB activities. In 
the interim a manual system is being implemented. 

DMA has a great deal of batch-oriented soft- 
ware which requires some form of interactivity. 
Two approaches can be taken when introducing 
interactivity. A separate user interface can be 
wr i tten for each program. A second approach i s 
to provide a dialog manager capability to inter- 
face with the operating system rather than using 
COBOL and FORTRAN to do this interface. DMA 
plans the second approach and has acquired the 
SPERRY Display Processing System (DPS 1100) which 
separates the development and use of predefined 
screens from the appl ? cat i on program i tself . 
Center personnel are currently learning to use 
th? s tool . 

The third major phase of implementing DMA's 
SPERRY 1100 MPE is the Introduction of Standards 
(whi ch apply to cal 1 mai nf rame and mini computer 
software development) into the Centers.. 

DMA has issued a four-volume set* "DMA Soft- 
ware Life Cycle Standards"* which is tutorial in 
nature* takes into consideration Center differ* 
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ences* conforms to DoD Automated Data Systems 
(ADS) documentation standards (DoD 7935) and 
reflects statQ-of-the~art A DP ; software 
practices. It consists oft DMA Software Design 
and Implementation Standards Manual (SDISM); 
Structured Programming in FORTRAN [1*3; Struc- 
tured Programming in COBOL [133; and Structured 
Walk-Through Guidelines. Volumes II and III 
detail the simulation of basic structured 
programming constructs and* thus allow generation 
of "structured code" without the use of precom- 
pilers. Volume IV provides general guidelines 
for structured walk-throughs for all software 
life cycle phases. Following Center review* the 
standards will bQ introduced in a phased manner. 
An additional document* "Software Contracting 
.Guidelines" is being developed to assist non-ADP 
personnel in contracting for software. 

7. Ski lis Upgrade 

The third major area addressed by the SIP is 
the upgrading of the skills of both managers and 
software developers. Areas of emphasis for 
managers aret quality assurance* managing struc- 
tured programminO projects* project management 
and control* state-of-the-art awareness* produc- 
tivity assurance* and contracting for software* 
Managers must understand the concepts and methods 
employed in an MPE since they differ from those 
of the traditional software projects. 

Four areas of training will be emphasized for 
software developers! (1) SPERRY skills* (2) 
state-of-the-art awareness* (3) MPE 

i ntroduction, and (4) new technology . As with 
management training* a variety of methods (e.g.* 
lectures* seminars* laboratory* video cassette* 
on-site) will be used. Training topics include! 
the structured software life cycle* applying 
standards; use of tools (e.g.* FAVS* DMATRAN), 
designing interactive systems; using terminals; 
SPERRY refreshers; data base maintenance, query 
language/report generators; structured life 
~cydle standards; redesigning existing software; 
and optimization techniques* networking* commu- 
nications and graphics. 

8. Summary 

In summary, DMA is commmitted to a five-year 
Software Improvement Program encompassing i^iree 
major areas: introduction of a modern programming 
environment* improvement of existing software** 
and upgradi ng development and management ski 1 1 s 
to support the new environment. General policy 
includes! adoption/use of structured programming 
as a standard/ maximizing the use of tools to 
facilitate .software development* compliance with 
ANS COBOL and FORTRAH standards* 

establishment/use of a centralized program 
1 i brary* adopt ion/ enforcement of software 1 i f e 
cycle standards* elimination of multiple versions 
of common software* establishment/ of quality 
assurance groups to ensure adherence to 
standards. introduction of a Configuration 
Control Board* introduction of interactivity* and 
a phased approach to software cleanup/redesign in 



whi ch hi gh priori ty software i s cleaned 
up/redesigned first. 

Successful i mplementat i on will jprov i de the 
following benefits! 

1. A competitive environment in which DMA will 
not be locked into a single hardware vendor 
because of the di f f i culty/costl i ness of 
software conversion. 

2. A standard software base in which production 
software is identifiable and maintainable. 

3 . Standard software development pract i ces 
within and between Centers. 

4. Tools to improve productivity.. 

5 . A modern envi ronment offering i ncreased job 
satisfaction to software developers. 
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ADP organizations are plagued with \high maintenance costs, long delays in 
responding to users' changing needs, and\ cont inued development and maintenance 
of antiquated, outmoded, and relatively obsolete software. This software can be 
thought of as being in an advanced statelof software senility, a degenerative 
condition, which if not corrected, will eventually render the software totally 
useless. A reversal of this situation requires a Software Improvement Program 
(SIPT, which is a treatment for the ills £f software senility, and offers a cure 
for many of the software problems from which most ADP organizations are 
suffering. A SIP is an incremental and evolutionary approach to modernizing 
software to maximize its value, quality, elf ficiency, and effectiveness, while 
simultaneously preserving the value of past software investments and enabling 
the organization to capitolize on today^s modern ADP technology, as well as 
future technological advances in the field] This paper describes the SIP 
philosophy and presents a strategy f#r implementing a dynamic, ongoing SIP 
coupled with a sound Software Engineering Technology (SET), to attack the 
causative factors of the ever-growing software crisis. 

Key Words: Software Engineering Technology (( SET) ; software improvement; 
Software Improvement Program (SIP); software^ obsolescense; s te pwi se re f inement . 
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1. Need for Software Improvement 

Over the past several decades there have 
been substantial changes in the automatic data 
processing (ADP) industry. There have been 
dramatic increases in hardware productivity, 
with a significant decrease in the footprint of 
the hardware configuration due to its reduced 
size, component modularity, and lowered air- 
conditioning and electrical consumption. 
Simultaneously, total ADP costs have continued 
to rise with the largest costs shifting from 
hardware to software. This shift in costs is 
primarily due to substantial automatic data 
processing equipment (ADPE) price reductions, 
coupled with increased personnel costs for 
software development and maintenance activities. 

During this- same period of 'high-powered , 
low-cost, rapidly-advancing ADPE, many ADP 
organizations are facing a software crisis. 
Software activities are still lfcbor intensive, 
with little increases in productivity being 
realized in software production and main^s lance . 
Resource utilization has shifted from software 



development acLivities to maintenance! with over 
hajlf of all software personnel involved in 
correcting software errors, modifying software 
to! change its functions or extend its life,^and 
simply keeping the software operational [l] . 

x 

Most existing government software is well 
over a derade old, with some as much as twenty 
to ; twenty-five years old. Much of the software . 
was originally written on second-generation 
hardware and operating systems, in machine- 
dependent and nonstandard languages, and have 
undergone several hardware; ^operating system, 
and language conversions. Most of this software 
was written with little or no utilization of^ 
software design, p rogramming , or testing stan- 
dards, guidelines or procedures; required 
substantial operator intervention; utilized 
sequentially accesstii card and tape input and , 
output files; and had minimal, inadequate, or in 
some cases, a total hiifc of documentation. 



Figures in fr qek^U; indicate the 
literature referen. *:s at the end of this paper. 
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Embedded in this aging software were 
home-grown system utility and operating system 
features such as sorts, merges, record buffers, 
copies, and manua L restarts. These features 
were included, of necessity, in the software 
because most of, the features of modern software 
package utilities and operating systems, which 
we now take for granted, were not available as 
packages or in operating systems of that d^y. 
Many of these home-grown utilities and ope'rating 
systems are no longer supported by the devel- 
oping organization or the vendor, nor is there a 
readily available and adequate pool of pro- 
grammers for maintenance of this software. 

In the past, bigger or more powerful ADPE 
configurations, or emulation or simulation has 
been a quick fix for these software problems. 
But increasingly, the hardware fix for the 
software aches and pains ha s** been found to be A 
fleeting panacea, or a temporary solution at// 
best^; and today's modern systems cannot, and do 
not, support emulation or simulation of t^ie 
older programming features and practicer' 
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Coupled with these problems of aging 
software, ADP organizations are plagued with 
high maintenance costs, long delays /in respond- 
ing to users' changing needs, and continued 
development and operation of antiquated and 
underpowered computer software. Productivity 
increases for ADP organizations with these 
problems are severely limited, if not impossible 
to attain, due to the proliferation of archaic 
software analysis, design, coding, and testing 
features and techniques; low- level and nonstan- 
dard languages; machine or environment depen- 
dencies; and custom-written utilities. 

This antiquated, aging, outmoded, and 
relatively obsolete software is in need of 
modernization. While this software cannot be 
termed totally obsolete, because it is still 
operational, it can ,be thought of as being in an 
advanced state of software senility. Software 
senility is a degenerative condition, which if 
notycorrec ted , will eventually render the 
software totally useless. 

In view of the many and complex, afore- 
mentioned software problems, and the emerging \, 
trend that the software crisis will cont inue to 
grow *»*id worsen, a quick fix or single solution 
to the problems is not feasible, and a direct 
conversion from the problem environment to a 
modern ADPE system and environment is virtually 
impossible. To solve these problems and combat 
the software crisis, a program must be insti- 
tuted to preserve the value of past software 
investments as much as possible, and provide an 
incremental and evolutionary approach to modern- 
izing £he existing, software to maximize its 
value, quality, effectiveness, and efficiency. , 

Such a software improvement program (SIP) 
-is described herein as a treatment for the ill; 
of software senility, and offers a cure for njany 
of the software problems from which today's/' 



government ADP organizations are suffering. 
Institutionalization of a sound software engi- 
neering technology (SET) , coupled with a 
dynamic, ongoing SIP, can attack the causative 
factors of the software crisis; and provide the 
government with viable, modernized, effective, 
efficient, and high quality ADP systems, capable 
of capitolizing/on today's modern ADP tech- 
nology, as well as future technological advances 
in Lhe fielc 



/' 2. Goals of a SIP 

There are many goals for a SIP to achieve. 
Tb^ most important of them being" to- . 

improve software maintenance and 
control; 

reduce delays in responding to users' 
needs; 

improve software quality; 

increase programmer productivity; 

. decrease software maintenance costs; 

ins t i tut ional ize processes ; 

change software from a reactive to 
proac t ive state ; 

< . extend the sof tware 1 s A i fe ; and 

put the organization in a position to 
take advantage of new and emerging- 
technology. 

However, the end goals of a SIP are not 
only to improve software maintenance and 
control, but also to achiev.e as-much isolation 
of * f unct ion 5 and standardization of interfaces 
within the software systems as possible . The 
achievement of these goals is attained through- 
isolating system functions; 

allowing for interchangeability of 
system functions; and 

facilitating change^ of elements within 
funct ion . 

Isolating 'system functions through modu- 
larization is^a natural step towards avoiding 
reliance upon one architecture or environment, 
and increases software maintainability and « 
understandabil.ity . As functions are isolated, 
more /design alternat ives present themselves and 
"f ut/ther possibi 1 i t ies of segment a t ion emerge . > 
lus, isolation of function holds the key to. 
'selecting cost-effective and efficient system 
al terna t ives in the future . 

Functions should also be interchangeable 
with alternative design realizations to 
facilitate functional interfaces. This, 
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inter-changeability of function, usually achieved 
through the use of reuseable and standardized 
modules of code, ensures an easier change in the 
means of performing a function. For example, 
exchanging a called module that accesses a tape 
file for a called module that accesses a disk 
data set. 

Facilitating the change of elements within 
functions refers to the software's portability 
and maintainability. Setter portability and 
^easier maintainability through the use of 
single-function, standardized, and reuseable 
modules of code is paramount to achieving the 
goals of a SIP. Easier changeability of the 
software and its functions, increases and 
ensures more efficient use of the key "data 
processing resources, especially people and 
machines. Standardizing, modularizing, para- 
meterizing, and documenting the software are 
several techniques that aid in facilitating the 
change of elements within functions. 

Improving. the software's quality (i.ej, 
"making the software better), is probably the. 
best available means of achieving the SIP f s ' 
goals. Software quality is a measure of its 
excellence, worth, or value against some ideal 
or standard. Although quality is an ill-defined 
term, there are many specific properties, or 
attributes, by which it can be defined or 
measured [2]. Figure 1 illustrates a proposed 
hierarchy of the major software quality attri- 
butes and their subordinate attributes. Although 
the subattr ibutes are listed under only one 
major attribute, it must be stressed that 
several of them could conceivably be listed 
under more than one major attribute. For the 
sake of clarity and to minimize misunder- 
standings, each subattribute has been listed 
only once, under the major attribute with which 
-most Often associated. 

It must be noted that it is rarely possible 
for all software quality attributes or subattri- 
butes to" be implemented. It is first necessary 
to define the SIP f s goals, and then the improve- 
ment objectives for each individual software 
application. Appropriate tradeoff decisions must 
then be made among the various quality attri- 
butes and subat.tr ibutes , and the goals and 
objectives to be achieved. For example, some 
processing efficiency may have to be sacrificed 
to. achieve more maintainability, and vice versa. 

3. Description of a SIP 

A SIP can be thought of as- a preventive 
maintenance program for software. Like ADPE - 
preventive maintenance, software must also be 
periodically and systematically cleaned-up, 
fine-tuned, optimized, and enhanced to keep it 
in working order and capable of fulfilling its 
current and future requirements. 

The concept of software improvement is not 
new, rather it is an outgrowth of normal 
day-to-day software maintenance projects and an 



extension of conversion projects. Some types of 
software improvement, such as realigning code 
anji implementing naming conventions, are 
presently performed concurrently with such 
everyday tasks as software modification or main- 
tenance, or for conversion from one computer 
configuration to another. However, these 
improvements are traditionally performed on a 
random, piecemeal basis, without structure or ■ 
overall software or organizational considera- 
tions. 

Such improvement decisions are usually made 
by the individual programmer without the benefit 
of managerial input. This bottom-up, piecemeal 
approach to software improvement is unstruc- 
tured, and usually results in art\unsuccessful 
attempt to cohesively improve the Current 
software and acquire and use modern tools and 
techniques. 

In contrast to the traditional, single- 
purpose software improvement approach, the 
modern software improvement approach, as 
described hereinafter, actually serves multiple 
purposes. Under the modern software improvement 
approach, improvements are not performed to meet 
only a single need or objective, but rather to 
accomplish several objectives and reconcile 
multiple problem areas. Also, the decisions as 
to when software improvement is needed and what 
types of improvements are needed are not left to 
the individual programmer or analyst, and the 
improvements are not performed in a casual 
manner. Rather^, these decisions and the improve- 
ment performance are institutionalized as a 
formal process %o which all programmers and 
analysts must adhere. 

While t:he software improvement concept may 
not be new, /the software improvement approach to 
building or improving systems, is innovative and 
more sophisticated than the conventional and 
more simplistic software life-cycle approach. 
The software improvement approach to building 
and improving systems [1] is built on the key 
assumptibns that- 

mdst major ADP organizations have a 
decade or more of investment in soft- 
ware; . • ■• 

■ \ 

most Federal organizations are almost 
entirely dependent on their software to 
meet their mission; 

keeping software operational is diffi- 
cult enough without deviating from that 
baseline of software to add enhancements 
or change functions through major 
redesign or new development, which is 
thought to be an uncertain and risky 
business ; and 

there is a need to support new appli- 
cations to keep ADP costs low and 
service levels high. 
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USEABILITY 



RELIABILITY 



LPHCIENCY 



^Dependability^ 



^Accuracy ^ 



^mplement ability^ 



PORTABILITY 



Independence 



MAINTAINABILITY 



^ Uniformity ^ ^ Modularity ^ 



^ Testability ^ 



^ Reuseablllty ^ ^ModHlablllty ^ 



^nderatondablllty^ 



^Conciseness^ 



^ Simplicity ^ 



Figure 1. Hierarchy of Software Quality Attributes 
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The software improvement approach to 
improving existing systems, or building new 
systems from existing software, is different 
from the conventional system development 
approach in that it recognizes- 

the existence and characteristics of the 
current systems that support day-to-day 
operations; 

the existence of other operational 
systems that may be integrated to 
replace functions in an existing 
system; 

. the inherent problems in engineering new 
code in any quantity; 

. that existing systems are frequently the 
only specif ication of existing 
processes ; 

the need to preserve the testing 
integrity of the current system while 
moving to a new or improved system; 

that many faults or deficiencies in an 
existing system can be accurately and 
cost-effectively corrected by improving 

it; . , 

the need for an orderly, incremental 
approach to the building or improving of 
a system that allows for adequate 
testing, manageable pieces, constant 
achievement and growth, and progress 
feedback; 

. i the need for a useable version of the 
I new or improved system at each stage of 
j development or improvement, allowing for 
f rapid capitalization upon the new system 
and its components; and 

; the virtual impossibility of completely 
j re-engineering or redeveloping very 

large systems within a reasonable time 

frame [ 3] . 

The universe of software, from which a 
desired application can be built or improved on, 
can be conceived as a triangle, as illustrated 
in Figure 2. 

! At the apex of the triangle is all of the 
"software that currently exists" in production 
today. This software performs the functions 
that the organization needs to conduct its 
day-to-day business. Because this software is 
already working and tested,, it has an intrinsic 
value to the organization and represents the 

• vested interest an organization has in its own 
software applications. Existing software is 
usually salvaged, transferred, and incorporated 

■ into a new or improved system by purging any 
undesirable or unnecessary software, leaving 
some of the software as it is, and improving the 
remaining software through conversion, refine- 
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ment, and enhancement activities. While it is 
typically the easiest and most accurate to test 
and the least costly to produce, this software 
is often the. most expensive code to maintain 
because it is usually undocumented and built in 
a "patchwork" fashion. 

At the bottom left-hand corner of the 
triangle is "other operational software that 
exists' 1 in othe'r organizations. This external 
software represents the software packages 
available from industry or other ADP organiza- 
tions. While this software may suffer from some 
deficiencies, it may be modified to fit the 
organ izat ion' s needs . Also, many software 
packages are highly maintainable, well 
documented, and quite portable, and may not 
require extensive, if any, modification. 
External software is usually incorporated into a 
system by replacing existing code with an 
existing package. This software is typically 
somewhere between existing and new software in 
cost, accuracy, and maintainability, depending 
on the package's functions and its level of 
sophistication. 

Finally, at the bottom right-hand corner is 
"new software," which does not yet exist and 
must therefore be engineered . While this code 
is typically the easiest to maintain because it 
is state-of-the-art and newly documented, in 
terms of ac'curacy it is normally the most 
difficult to engineer and the most risky to 
undertake because there is no existing baseline 
from which to test or measure. It is also the 
most costly to produce because it must be 
engineered from "scratch." This software should 
be incorporated into a system as a last resort, 
if transfer of the existing software or replace- 
ment with a package is not feasible. Neverthe- 
less, any new software should be engineered 
using modern programming practices to ensure 
software that is well documented; fits the 
application better; is easy to support, read, 
understand, modify, and enhance; and is less 
expensive and time consuming to maintain. 

The software improvement approach to 
improving an existing system or constructing a 
new one, thus is one of- 

determining on a case-by-case basis, the 
source (e.g., existing internal soft- 
ware, existing external software 
package, or new software) of the 
software or software sub piece ; 

determining the actions required to 
modify and/or implement the software 
(e.g., purge the software from the 
system; salvage the current software by 
leaving it. alone and moving it as it is; 
salvage the current software by 
improving it through conversion, 
refinement, and/or enhancement; replace 
the current software with an external 
software package; or redesign/newly 
develop the software); 
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Figure 2. Software Improvement Approach 
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. assessing the software's source and 
actions required against the cbsts, 
benefits, and risks of each; and, then 

. developing, a strategy or plan for the 
software's purge, transfer, improvement, 
integration, and/or redesign/new 
development into the improved or newly 
constructed system. 

In summary, four basic advantages of the 
software improvement approach to improving or 
building a system over the'conyentionsl system 
development approach are that it- 

. minimizes uncertainty and risk by 

maximizing the utilization of testable 
components ; 

. preserves the value of past software 
investment as much as possible and 
avoids the dangers of failure inherent 
in the "tear-it-down-and-start-anew" 
approach; 

. enables the project to be broken into 
small, manageable pieces with an 
operational system at each phase; and 

is iterative in nature, which allows for 
the tasks performed to be repeated in an 
orderly, incremental fashion with 
constant achievement, growth, and 
feedback, until the overall objectives 
of the project are- met. 

4. Stepwise Refinement 

Because of the large amount of software 
that exists in most ADP organizations, most 
software can't be improved in one "lump sum." 
Thus, the software must be divided into smaller, 
more manageable increments, that progress 
through the software improvement process as a 
work unit. Increments can be based on system, 
subsystem, or project boundaries, or by func- 
tional areas (e.g., input, edit, file update, 
report generation, or error handling), The key 
is to subdivide the sofcware minimizing the 
interfaces between the groups. The absense, or 
minimization, of increment interfaces makes the 
improvements for each increment more indepen- 
dent, and allows the concurrent improvement of 
several increments at a time. 

Also, because of the vast differences that 
may exist between the current and desired 
software environments, most needed improvements 
can't be accomplished in one "quantum leap." 
Thus, the improvements for each increment are 
accomplished in multiple steps or releases (3] 
of logically related sets of improvement 
activities that are performed at one time. ^ 
Improvements are normally subdivided by activity 
type into the three basic releases of conver- 
sion, refinement, and enhancement as depicted in 
Figure 3. 



As illustrated in Figure 3, the software 
improvement. activities un* ► these three 
releases range from 1 svtf ' 
to complete re-engine r , .*;?. 
The software iraprovferwtt. 
one release to the fcc " "»V, 
"clear cut" dividing ' 
and some functional v 



-v.^slation of code 
c?f xisting systems, 
s. * '.ties flow from 
■ ■ , there is no 

• *n each release , 
. i nevitable. 



The decisions as to t,*;.« : er of releases 
necessary to improve each •> s -^nt, and the 
improvement activities to by >■<•■: ormed in each 
release, are dependent oa the of the 

• increment, overall number and type of improve- 
ments required,-, and priority of the improve- 
ments. Improvement activities can be combined 
into one large release, or farther subdivided 
into multiple mini-re leases ;> j illustrated in 
Figure 4. Figure 4 also ii. f rates the stepwise 
refinement approach to iipgr&'Kttg and modernizing 
the software, with continual advancement and the 
opportunity between each stp to reevaluate the 
SIP plans and results, and to introduce changes 
as necessary. 

Software improvement conversion activities 
transform the software, without functional 
change, standardizing it and making it environ- 
ment independent. Without standardization and 
independence, the next two releases, refinement j 
and enhancement ,wojild be extremely difficult, 
if not iraposs-i-bleT to accomplish. Standardized 
software, that is as independent as possible, 
lends itself to manipulation by automated means 
and proceduralized processes, and facilitates 
flexibility for future requirements (e.g., 
moving to, a new environment). 

Software improvement refinement activities 
modernize the software to a state-of-the-art 
status and improve software maintainability and 
programmer .productivity . Refinement is a 
prerequisite for software enhancement to ensure 
enhancements are not being made to unmaintain- 
able software with obsolete coding features • 
(e.g., EXAMINE or ALTER statements in COBOL) , or 
outdated or incorrect functional requirements. 

Software improvement enhancement activities 
optimize the value, quality, efficiency, and 
effectiveness of the software enabling easier 
technical- redesigns, easier addition of modern 
"technological" features and capabil i tie's; arid 
more efficient and effective use of resources. 
Without enhancement, the standardized and 
modernized software may still not function 
efficiently or effectively, or fulfill the 
user's desired requirements. ^ 

Improvement activities do not have to be 
subdivided into these three basic releases or 
follow the suggested release flow. They can be 
combined into one large release, or further 
subdivided into multiple mini-releases. The 
decisions as to the number of releases necessary 
to improve each increment, and the improvement 
activities to be performed in each release, are 
dependent on the size of the increment, overall 
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number and type of improvements required, and 
priority of the improvements. 

; 5. Major Benefits of a SIP 

The tasks required during the software 
improvement process encompass a broad range of 
activities, and at a minimum include- 

; performing a software inventory and 
ana Lys is ; 

. developing SIP plan(s); 

establishing engineering elements; 

preparing work packages; 

preparing test da^ta sets; 

developing software improvement release 
specifications; 

improving the software; 

unit and system testing the improved 
software ; 

. documenting the software; 

acceptance testing the improved soft- 
ware; and 

transitioning the improved software into 
production. 

From this list of tasks, it is clear that a 
SIP is labor, management, machine-resource, and, 
possibly, deadline intensive. However, in spite 
of the problems that wi^s^nevitably arise, a 
SIP can be successfully engineered and prove 
highly beneficial' to the organization. 

. The advantage of utilizing state-of-the-art 
technological advances, such as teleprocessing, 
data base management systems (DBMS) , and mass 
storage, is one such benefit. Also, a SIP 
provides the capability to use this modern 
technology without being " locked in" to archi- 
tectura4r~er environmental dependencies. 



Another benefit is the potential for more 
efficient and effective programmer productivity. 
Existing software, after improvement, can be 
maintained much more efficiently and the 
programmer's span of control should be greatly 
increased. That is, after spftware improvement, 
a programmer can maintain significantly more 
lines of code or system functions due to the 
increased maintainability and understandability 
of the improved software. The. result is 
increased availability of an organization's most 
scarce resource — skilled programmers. A SIP 
more efficiently uses key resources,' both people 
and machines. More readily available junior 
personne 1 . can be used for both new development 
and maintenance, with improved productivity, 
lower risk, and less training. The more senior 



•personnel can be used for more advanced tasks 
such as systems design or analysis, or tool 
evaluation and selection. 

Additionally, the incorporation of a 
Software Engineering Technology (SET), con- 
sisting of a synchronized set of software 
standards and guidelines, procedures, tools, 
quality assurance , and training implemented 
through and coupled with a dynamic, ongoing SIP, 
simpl i f ies the learning required of programmers 
and analysts . The simplified learning enables 
the institutionalization of a single training 
program for a common methodology and consoli- 
dated goals and objectives. 

More efficient use of the ADPE is also • 
possible because the state-of-the-art ADPE wilL 
not simply emulate obsolete or out-of-date 
functions. Rather, it will perform the techno- 
logically advanced activities for which it was 
designed. 

Manyadditional benefits can be achieved 
with a thorough, comprehensive, and well- 
planned , -ana lyzed , and -managed '< 4 S.IP . Some of 
these include- 

improved user service leve Is ; 

more f lexibi 1 i ty for future require- 
ments; 

the capability for automatic documenta- 
tion and/or code generation; 

enhanced error recovery, system 
debugging, testing, data integrity, and 
security features; j 

increased software qual^v (i.e., 
reliability, efficiency, portability, 
and/or maintainability); 

improved quality of software end- 
products (e.g., reports, statistics, and 
programs) ; and 

a synchronized , formalized , and. tested 
SET for the SIP. 

6. SIP and SET. Interrelat ionship 

As previously discussed, the SIP works 
closely, and in tandem with a SET". A SET, as 
depicted in Figure 5, * consists of a synchronized 
group of five equally important software 
engineering elements: 



standards and guidelines; 

procedures ; 

tools; 

quality assurance (QA) ; and 
training . 



These five software engineering elements 
direct and control all software activities 
throughout the software's life cycle [4]^ and 
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for different software engineering or 
re-engineering purposes (e.g., software develop- 
ment, maintenance, improvement, conversion, or 
redesign). Thus, the SET addresses, on an 
orgdfl-ins^t ion- wide basis, the software engi- 
neering methods, metrics, and latest controls 
for managing an installation 1 s software activ- 
ities, while the SIP addresses the upgrading of 
the existing software (i.e., programs, modules, 
job streams,, files, and documentation) with 
regard to the SET baseline. 

The same five engineering elements of a SET 
are required, in a specialized sense, for a SIP. 
The establishment of these. five engineering 
elements as a formalized SET is paramount to the 
successful implementation of a SIP. Improvement 
and installation standards,, guidelines, and 
procedures are required to standardize the 
software activities and the software, so they 
can be measured, controlled, and improved. 
Specialized improvement tools are bojrh necessary 
and desirable to increase programmer produc- 
tivity, enforce systemiza t ion , and improve 
controls. QA is'required to quantitatively prove 
that the SIP is a viable and worthwhile effort, 
ensure that the improvements made actually 
resolve- -the problems, identify and measure the 
resultant improvements and benef its, and., control 
and enforce the quality of the software and the 
improvement performance. Finally, training and 
retraining are also necessary, for without them 
successful accomplishment of the SIP would be 
next to impossible and the improved software and 
-methodologies would quickly degrade. 

The establishment of a SIP and a SET are 
separate, but. interrelated and coordinated 
processes. Each can be established indepen- 
dently of the other, but each has a controlling 
or influencing effect on the other. That is, 
the standards and guidelines, procedures, tools, 
QA, and training established for the installa- 
tion as a SET, define the framework for, and 
provide a baseline from which, the SIP can 
operate. In this sense, the SIP cannot, for 
example^Tset standards that oppose those 
insti^iited in the SET, or use tools that 
conflict with the tool's technology chosen for 
the organization and established in the SET. 

Conversely, standards and guidelines, 
procedures, tools, QA, and training, when estab- 
lished in a, SIP, limit the choices of the SET. 
For example, the SET cannot institute software 
or installation standards different from those 
just implemented by a SIP, or maintained and 
enforced by the improvement tools. If either 
the SIP or SET institute engineering elements 
without considering the organizational impact, 
or short- and long-term consequences, the 
resulting software and engineering activities 
will, at best, be chaotic and consist of a 
"patchwork" of styles, structure, and standards. 

A SIP and a SET should be established 
together, one complementing the other. This 
double-barreled approach to resolving software 



problems can be thought of as a Software and 
Technology Modernization Program (STMP). Figure 
6 illustrates a typical interrelationship of the 
SIP and SET as integral parts of a STMP. * 
Implementing a STMP thus becomes an effort to- 

identify needed program-unique or 
organization-wide engineering elements 
, (i.e., standards and guidelines, proce- 
dures, tools, QA mechanisms,, and 
training); 

consider long-term software activities 
and consequences (e.g.,* ADPE and 
software compatibility, the upward 
compatibility of software changes, and 
technological advancements) , and analyze 
the full spectrum of impact (e.g., 
across engineering activities, or 
between projects) ^before' adopting any 
specific standards, procedures, etc.; 

isolate the program-unique engineering 
elements for the SIP from the 
organizat ion-wide engineering elements 
for the SET; ' '• 

; adopt and institute as part of the SIP, 
the program-unique engineering elements; 
and \ 

adopt and institute as part of the SET, 
the organization-wide engineering 
elements. 

7. Synosis of a SIP 

J A synopsis of the six key principles of a 
SIP arej| 

. Evolutionary Growth: That is, build on 
your past software investment as much as 
possible'by purging some of the soft- 
ware, leaving some of it "alone, 
replacing some software with packages, 
improving most of the software, and 
redesigning or newly developing only 
that which is absolutely necessary. 

Incremental Improvement: Minimize' the 
risk of failure and make the SIP more 
manageable by grouping the software, 
through func t ipnal de compos it ion, into 
smaller, logical subpieces with minimal 
interfaces. 

Top-Down Planning with Bottom-Up Input: 
Plan in a hierarchical manner, from a 
genera 1 , overall SIP ; leve 1 , progressing 
to more specific, increment levels. 
Allow for continual feedback, analysis, 
and evaluation of improvement plans, 
results, and methodologies. 

SIP Pilot Project: Prototype the 
improvements , engineering elements , and 
methodologies on a small scale to 
empirically demonstrate the feasibility 
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and success of the improvements, method- 
ologies, and. plans; and to help solve, 
early in the SIP life cycle, any 
technical problems that may occur; 

Release Specifications: Subdivide the 
improvements required for each increment 
into logically related sets of improve- 
ment activities that can be performed at 
one time (e.g., conversion, refinement, 
and enhancement) „ Develop release 
specifications which direct and control 
the improvements to be performed for 
each release of an increment. Be sure to 
include in the specifications the 
specific improvements required for each 
individual system, subsystem, program, 
module, job stream, and/or file in the 
increment, as well as the required 
deliverables, standards of performance; 
and acceptance criteria. 

/ 

. Engineering Elements (SET): Establish 
within the SIP,) or; as a baseline SET, 
formalized engineering elements (i.e., 
standards and guidelines, procedures, 
tools, QA, and training) to be imple- 
mented, employed i and enforced by the 
SIP. * 

\ - 

While the SIP guide/lines presented here may 
not seem to be "earthshaking," they are a less 
risky, more formalized means of modernizing the 
organization's information processing, and have 
been found to be "tried-and-true ." The use of 
these SIP guidelines is strongly encouraged , p and 
in concert with a strong SET, will promote more 
uniform, thorough, cost-effective, and efficient 
software and software engineering activities. 

8. SIP Case -Studies 

The most successful organizations have 
recognized that their software is a key asset, 
which must be developed, managed, controlled, 
and maintained with as much care and attention 
as their other important assets. That is why 
these organizations have invested, or are 
investing, significant resources into SIP's, 
using principles similar to\those described 
here. \ 

The experiences of these organizations 
should not be considered unique. The concept of 
a SIP is indeed a valid alternative to the two 
traditional choices of "don 1 t-touch-it-or-it- 
will-fall-apart" and "tear-it-down-and-start- 
anew." Unless the functions are changing, 
substantial redesigns may not be necessary, and 
a SIP may solve immediate, as well as long-range 
ADP problems. It is an alternative with 
principles and procedures applicable to most 
government agencies, arfd must be given serious 
consideration . 

Several organizations have successfully 
established and undertaken SIP' 8. Some examples 
of these organizations are the Office of 



Personnel Management (OPM) in Washington, DC; 
Raytheon Service Company in Boston^ MA; NCR 
Corporation in San Diego County, CA; and the San 
Diego County Department of Education in San 
Diego, CA. 

Several years ago, OPM undertook a SIP with 
gratifying results. Many of its ADP systems 
were developed in Assembler language for an RCA 
^Spectra 70/45, and when converted to COBOL, 
still reflected the second generation logic of 
the earlier- A&s£mble_rtcode . OPM decided to adopt 
a controlled system improvement approach, 
migrating in-steps from a second to third 
generation system. The "decision was. also made to 

convert the Assembler code to ANSI COBOL to ' 

simplify maintenance and- enhance portability 
considerations [5] . 

Similarly, in the late seventies and early 
eighties, Raytheon undertook a SIP with the 
primary objective of developing, implementing, 
and perfecting a Veuseable _code methodology for 
accelerating applications development. By using 
reuseable code, 40 to 60 percent of the redun- 
dancy in their, business applications development 
was eliminated, and maintenance was substan- 
tially improved [6] . 

In late 1976, the NCR Corporation undertook 
a large sea If: Quality Improvement Program (QIP) 
for a major set of (Systems software for over 103 
separate product*.' This saftware set included 
operating systems-, compilers,, peripheral 
software, data utilities, and telecommunications 
handlers, and totaled over 1\3 million lines of 
source code. The QIP was initiated to provide 
improvements in the software base and to take 
advantage of recent advances in- the state-of- 
the-art of software engineering. NCR found 
several* major favorable effects resulting from 
the QIP, such as a substantial reduction in * 
outstanding problems in the software base, a 
reduction' in the average number of error reports 
per month, total elimination of problera'back- . 
logs, and a significant reduction in late^. 
responses to problem reports. All of these\ 
improvements are reflected in an improved 
perception of the quality of the software, and 
allowed NCR to make a very substantial redirec- 
tion in funds from support of \existing products 
to the development of new ones r {2] . 
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Finally, the San Diego County . Department of 
Education's ADP data\center cut its maintenance 
time by 70 percent as a result of a SIP. This 
startling reduction in the data center's 
program-maintenance load has resulted primarily • 
from the decision to adopt and convert to 
structured design and programming techniques, 
with ongoing and formalized ADP training in 
tttese same areas. While the primary emphasis in 
thi-s--ef fort was on redesigning and replacing the 
existing systems, rather than salvaging the 
existing systems, the six key principles of a 
SIP were basically adhered to [7]. 
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Besides the organizations that have 
established and undertaken successful SIP's, 
several Federal organizations are currently in 
the initial steps of establishing SIP's. 
Several of these organizations include the 
Social Security Administration (SSA) in Balti- 
more, MD; Veterans Administration's (VA) Data 
Processing 7 Center (DPC). in Austin, TX; and 
Defense Mapping Agency (DMA) in Washington? DC. V 

the SIP being established by the^ J5SA is a 
key example where the two traditional alterna- 
tives are infeasible. SSA is in the process of 
transitioning its ; more than 16,689 computer 
programs, containing' over U million lines of 
code, Srom a "survival" mode to a sta-te-6 f-the- 
art environment. To leave the systems alone 
will only result in further ADP system deter- 
ioration and seriously jeopardize the agency's 
ability to perform its basic mission. Con- 
versely, to redesign all software would be 
extremely risky, require maximum reinvestment of 
resources, and require more time than SSA has to 
survive the current crisis. Thus, the key is. an 
incremental, evolutionary improvement approach, 
aimed at a recovery of SSA 1 s heavy investment in 
its software, and the ability to take advantage 
of new ADP technological advances [8], 

' The VA' s Austin DPC is another example" 
where top management has recognized that the 
efficiency and effectiveness of their mission is 
a function of their, software. The Austin DPC 
has over three million lines of code of applica- 
tion software. "Like software in most government 
agencies and industry, this software was not 
developed overnight; rather it has evolved over 
many years, with layer upon layer of modifica- 
tion making it even more complex, unmangeable, 
and unmaintainable. Together with a hardware- 
. upgrade and, SET initiative, the Austin DPC is 
initiating a SIP to cut escalating ADP costs, 
improve the quality of service to the veteran, 
and better support far-reaching management 
decisions [9] . 

» The DMA is currently in the initial stages 
of a five-year program to upgrade its software 
•and modernize it software production practices. 
The ultimate objectives of this SIP are to ^ 
increase productivity, improve software quality, 
and standardize software development practices. — 
DMA's SIP encompasses the three major areas of 
introducing of a modern programming environment, 
improving existing software, and upgrading 
• development and management skills to suppor£ the 
new environment [10]. 

Several more organizations establishing 
SIP's are Tupperware in Orlando, FL; Ford ; 
Aerospace and Communications Corporation in 
Sunnyvale, CA; and New Jersey State Government. 

From the preceeding discussion, it should 
be clear that organizations who can c no longer 
afford outdated and inefficient information ^ f 
processing, want to combat the software crisis, 
want to stop "software "senility" in its tracks, 



and, on the whole, need to modernize their . ; . 
software and software engineering technologies 
must establish a SIP. A commitment to undertake' 
a SlP^begins with top'' management, progresses 
through the ADP organization, and, ultimately, 
ends with the user. Top management commitment , 
to the SIP is a major factor for ifs success, 
and manifests itself in^ three forms: 

First, top management must acknowledge 
that a software problem really exists, 
and resolve Co correct it. 
* * . 

. Second, top management must be willing: 
to "put their money where their mouth 
is." Th"at is, they must not offer only 
"lip service," but be willing to devote 
the resources necessary to implement the 
SIP. Resources include people, dollars, 
time, ADPE,, tools, and other miscella- 
neous supplies and materials. 

'/ 

Third, top management must actively 
support the SIP and ADP organization by 
helping to advertise the SIP goals, 
objectives, benefits, and achievements, 
and by gaining user involvement and 
support . . 

There are three documents currently 
published on the subject of software improvement 
that may be of further interest to organizations 
contemplating a SIP. These documents are listed 
in the references [1, 11, 12] *nd contain more 
detailed information on fhe need for software, 
improvement, "planning and implementing a SIP, 
and the actual software improvement process. . 
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1. Introduction 

Many major ADP centers have a decade or more 
invested in their applications software and the 
organizations they support are almost totally de- 
pendent upon its operation. The software works 
in that its logic is basically corrected it 
supports the organization's mission. In most 
centers, merely keeping the software operational 
is such a difficult task that any change to the 
baseline, either to enhance or add functions, is 
reasonably perceived as both risky and expensive. 
However, the software must be imp^ved tokeep 
ADP costs low and service reliable. Software im- 
provement is the process of modernizing software 
by retaining its fundamental logic whi e upgrad- 
ing its reliability, economy, and flexibility. 
The goal of improvement is to posture software to 
take advantage of new technology.. 

The software improvement approach differs 
from traditional system development because soft- 
ware improvement recognizes the following factors. 

0 the fact that current systems "work"; 
0 the substantial investment in current 

0 thftechnical and informational properties 

of current systems; . 
0 the interchangeability of some of tnes .e 

properties with functions of other systems; 
0 the risks in developing new programs in any 

but trivial quantities; . * 

0 the only specifications for many existing 

svstems are the systems themselves; . 
0 the desirability for an orderly improvement 

0 Shewed for a usabfe system at every stage 
of improvement, and 

0 the cost and technical difficulties of re- 
engineering a large system within a. reason- 
able time frame. 



The combination and interaction of the above 
factors suggest .-thatjnost organizations with 
sujstant al investments in software should improve. .... 
their software incrementally and not redevelop the 
total svstem. Some software is so old and so 
ate ed yS tS it can'Vbe improved However w ere_ 
software improvement is possible it has the follow- 
ing advantages: 

1. minimizes risk by retaining an operable 
baseline; 

2. preserves value of ^st software investment; 

3. upgrades in small, manageable, and testable 
elements; and 

4. progresses by iteratively enhancing the 
existing baseline. 

Organizations with large software investments.' 
like the Federal Government, are underwriting am- 
bitious programs of software improvement. For in- 
stance the Social Security Administration, the 
Defense Mapping Agency and Veterans Adm ni strati on 
Save 111 initiated Urge software improvement pro- 
grams to modernize their computer software includ- 
ing its features, capabilities, and engineering 
practices. Also the General Services Administra- 
tion Office of Software Development has initiated 
a government-wide software improvement program to 
upgrade software systems across the Federal 

0SD-81-1UA ) 

Normalization is the initial and most f und- 
ent? l technical process in achieving software 
Em ment It is the process of stander dizing 
. listing systems thus making them maintainable, 
enhaScelb e and portable as well as posturing them 
to take advantage of new technology. After a 
system is normalized, it can be more easily re- 
fined and optimized, new functions can be added in 
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a predictable manner, and economies can be real- 
ized. 

This paper defines normalization by: (1) 
demonstrating how it improves a typical applica- 
tions capability (our example will show a COBOL 
application that utilizes both a Teleprocessing 
Monitor(TP) and a Data Base Management System 
(DBMS)), (2) describing how it can be automated, 
anil (3) providing two examples in which it has 
been used to improve software. 

2. Normalization: A Definition 

Normalization is the process of improving 
software by making it: 

0 enhanceable 

0 maintainable; and 

0 portable. 

Enhanceable software can be economically im- 
proved either by adding new functions or by re- 
fining and optimizing current functions. Main- 
tainable software can be quickly and reliably 
fixed when it breaks or can be modified when 
changes are requested. Portable software lean be 
moved from one computing environment to another 
to take advantage of the technological economies 
and advancements of new hardware or software. 
Normalization improves these attributes of exist- 
ing software. 

Exhibit 1 outlines a typical COBOL system 
before it has been improved through normalization. 
The applications software is intertwined with the 
supporting environment, as well as knotted with 
extensions to a vendor's COBOL, with interfaces 
to a DBMS and to TP. 

BEFORE NORMALIZATION 



PROGRAM MUST 
HAVE KNOWLEOGE 
OF SYSTEM 



CODE WITH 

VENOOR 

EXTENSION 



PACKAGE 
SPECIFIC 
TP CAU 



PACKAGE 
SPECIFIC 
OBMS CAU 



VENOORS 
COMPILER 



TP PACKAGE 



OBMS 
PACKAGE 



COBOL 

APPLICATION 
PROGRAM 



Exhibit 1. 



Typical COBOL On-Line System 
Before Normalization 



The applications software must "know" the 
characteristics of the DBMS, the TP, and the 
operations control language (OCL) peculiar to the 
computing environment. The program syntax has 
extensions that frequently are inconsistent with 
ANS standards. Program semantics use data in 
ways peculiar to the source vendor's architecture. 
Program logic is linked to the DBMS, the TP and. 
the operating systems by imbedding "knowledge" of 
these packages throughout the code. As a result, 
any significant change in either the code or the 
packages has a profound impact upon the other. 
The problems of maintaining and enhancing such a 
system are significant. 

Exhibit 2 depicts a normalized capability in 
which the knots have been untied and the applica- 
tions "freed" from the host system through stand- 
ardization and information "hiding." The program, 
syntax* has been normalized into ANS approved con- 
structs and the semantics normalized so the pro- 
grams will produce the same results if they are 
executed on other vendors computers. The know- 
ledge of the DBMS, the TP, and the operating 
system has been separated from the application 
logic and embedded in bridge programs. This means 
the application programs can be changed without 
impacting the system, and that the DBMS or TP can 
be changed without altering the program's logic. 
The interface knowledge is "hidden" in the bridge 
programs. There is some processing overhead in- 
volved in hiding the information. The cost bene- - 
fits coriie from reductions in software costs. 
Normalization trades improvement in software 
quality for extra computer cycles. 



AFTER NORMALIZATION 



PROGRAM FREEO 
OF SYSTEM 
KNOWLEOGE 



SYSTEM KNOWLEOGE 
IN BRIOGE 
PROGRAM 




Exhibit 2. Typical COBOL On-Line 

Capability After Normalization 



Exhibit 3 outlines a total picture of how 
normalization improves applications software. 
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The system has been made more enhanceable, main- 
tainable, and portable as a result of normaliza- 
! tion. The software has been improved while pre- 
serving the legacy and investment in the a PP n T 
cations logic. This improvement was managed with- 
: St redevelopment. Thus, the risks of developing 
new code were avoided; the costs and technical 
difficulties of engineering a new system were 
obviated; and the investment in the current system 
was preserved. The following sections will des- 
cribe how to normalize a system and will offer 
two examples of a successful normalization. 
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be changed; the software foundation will. The 
programs must be normalized both in terms of their 
syntax and semantics. This entails translating 
non-ANS statements into ANS and norma i zing the 
use of data. The DBMS must be normalized in terms 
of program calls and interfaces, data schema, and 
data representation. This entails standardizing 
the interface between the program's logic and the 
system's handling of data. The TP monitor must be 
normalized in terms of program calls and inter- 
faces, message handling capabi ^.^ies, and screen 
formats. This entails standardizing the interface 
between the program's logic and the system s hand- 
ling of transactions. The operating control lang- 
uage must be normalized in terms of standardized 
utilities and job streams. This entails standard- 
izing the interface between the program's logic 
and the operating system. The program documenta- 
tion must be normalized in terms of applying a 
standardized and automated documentation tool 
that will keep it accurate and current. 

To address erch system element, three tasks 
must be completed. These tasks, outlined in 
•ExTiTbit 3, are: Evaluation, Translation, and • 
Verification and Validation. Each must be com- 
pleted using each system element shown in Exhibit 
4. 



Exhibit 3. Comparison of a System Before 
and After Normalizing 



3. Normalization: The Process 

To improve a software applications system, ■ 
each system element should be normalized. This 
mav include upgrading the programs, DBMS, TP Mon 
iter. Operating System and documentation as shown 
in Exhibit 4. The basic program logic will not 



EVALUATION 

0 Identify each change to each element 
0 surname and analyze these changes 

TRANSLATION 
0 make each needed change 
0 Incorporate change Into system 

VERIFY AND VALIDATE 
0 test translated system 
0 certify functional equivalence 



PROGflAKS 



0 Syntax 
0 Semantics 

DATA BASE MANAGEMENT SYSTEM 

Fj Program Calls 
0 Data Schema 
0 Data 

TELEPROCESSING MONITOR 

0 Program Cal Is 
0 Message Handling 
0 Screen Format! 

OPERATING CONTROL LANGUAGE 

0 utilities 
0 OCL 



DOCUMENTATION 
0 Automated Program Documentation 



Exhibit 5. 



Tasks That Must Be Accomplished 
to Normalize a System. 



Evaluation (EVAL) identifies the. individual 
chang es to each software element and the manner in 
wSich each change must be made. EVAL also ^sunjnar- 
izes these changes and performs an analysys on the 
summary. This analysis provides a quantitative 
Dicture of the difficulty of normalization and 
Srovi 2s statistics on which to base the schedule 
ami cost of the subsequent tasks. Exhibit 6 shows 
tow an automated evaluation technology parses the 
system elements, Identifies each change, and 
summarizes the changes. 



Exhibit 4. Software Elements That Must 
Be Normalized 
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Exhibit 8 compares three methods of organiz- 
ing a normalization, the manual approach, the 
.partial automated approach, and the integrated 
automated approach. These approaches differ in 
the amount of technology each uses. Jhe cost and 
risk of normal ization . can be significantly reduc- 
ed as the technology for normalizing is improved 
from disjointed manual processes to integrated 
automated processes. 
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Exhibit 6. Automated Software Evaluation 

Translation (TRAN) makes each change identi- 
fied in the evaluation and incorporates these 
changes into the system. The translator(s) must 
change the programs, DBMS, TP, OCL, and document- 
ation. Thus there is a specific TRAN tool for 
each of these elements. TRAN changes the programs 
into ANS COBjOL and normalizes their use of data; 
builds DBMS bridge prpgrams, normalizes DDL schema, 
and translates data irito the new models and repre- 
sentations; builds TP screen formats and stand- 
ardized interfaces; builds new OCL streams; and 
generates new documentation. 

Validation and Verification (V&V) tests each 
change and validates that the normalized system 
is the functional equivalent of the pre-normal ized 
system. V&V testing includes both unit level and 
system level testing and verifies the normalized 
software at the unit level and the system level. 
V&V provides a testbed for testing, verifying,; 
and validating the normalization, 

Exhibit 7 outlines a five-by- three array 
that represents the work that must be accomplished 
to normalize a system. The intersection of each 
system element and each normalization task repre- 
sents work that must be accomplished to complete 
the normalization. Its composite represents the 
total effort to do the job. Additionally, each 
individual piece of work must be coordinated into 
a broader, integrated effort. If they are not, 
the sum of the costs of the parts may be much 
.greater than the value returned by the total 
effort. When this happens normalization, like 
system development, becomes a risky expensive 
undertaking. 
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Exhibit 7.\Relationships of Elements That Must 



Be Normalized to Task Needed to Normalize. 
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Exhibit 8. Comparison of Three Methods of 
Organizing a Normalization Project 

The most risky approach is the conventional 
manual process that relies totally on manual 
effort to complete the job. This method features 
all of the problems associated with finding, keep- 
ing, and managing large staffs plus the problems 
of managing the quality of unpredictable, error- 
prone, manual techniques. The manual approach 
entails the risks of redevelopment without offer- 
ing the potential benefits of a "new" system. 

The partial automated approach is an improve- 
ment over, the manual. It relies on selecting 
tools from various sources and combining these 
tools with man-hours needed to u§e them. This 
method is an improvement over the manual because, 
it applies some modern technology (e.g., it subr 
stitutes predictable technology for manually 
efforts). However, it is plagued with the manage- 
ment problems of coordinating disjointed tools and 
techniques, adjusting uneven staffing requirements 
(some tools are more labor saving than others), 
and integrating these variables into a total pro- 
ject effort. Often the gains of using disjointed 
tools and techniques are lost to the inefficien- 
cies of managing and coordinating them. 

The preferable methodology is the integrated 
automated approach depicted in Exhibit 8 and which 
features tools that not only fit each specific m 
work area but are also part of a total technical 
process. This preferable technology is tool or 
technique intensive and ''fills 11 each box with. a 
large amount of automation and a much smaller 
amount of manual effort. The tools reinforce each 
other in an integrated and supportive manner. For 
example, the evaluation tools (EVAL) provide in- 
formation and a framework that the translation 
tools (TRAN) use to make the changes identified in 
the evaluation phase. These changes provide the 
audit trail that the verification and validation 
tools (V&V) use to verify the quality of the 
normalized system. 
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Exhibits 9a and 9b show the work flow for an 
automated normalization. It starts with the 
planning step and ends with a fully tested sys- 
tem. Each step in the process is assisted. by 
techniques whose products intergrate with sub- 
sequent steps. The steps should braided by tools 
as follows: 
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Exhibit 9a. Phases, Steps & Tools of Normali« 
zation Work Flow and Work Processes 
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Exhibit 9b. Normalization Work Flow and 
Work Processes 



are not 'managed correctly, the normalization 
effort will fail. Automation makes these steps * 
both manageable and predictable. 

The baselining step, shown in Exhibit 10, 
prepares the software for normalization. It pro- 
duces a definitive specification, an empirically 
derived picture of the current production software 
system. It includes not only each of the software 
elements but also the dynamic behavior of these 
elements as they are tested, including test data, 
test transactions and processing results. 
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Exhibit 10. Baselining Step in Planning 
Process 

The System Test, shown in Exhibit 11, verifies 
that the normalized system is functionally equiv- 
alent to the baselined system. It uses the de- 
finitive specification provided by the baseline to 
verify that the improved normalized system produces 
"exactly" the same answers as the baselined copy. 



The amount of effort for the total job and 
the distribution of work per phase depends upon 
the size of the job and the difficulty of the . 
normalization. However, the integrated approach 
makes the effort predictable and the risk manage- 
able. 

Two Key Steps 

The two key steps in managing a successful 
normalization are baselining and systems testing. 
Normalization is not simply performing the trans- 
lation step. Although automated translation is 
necessary it is certainly not sufficient to manage 
the normalization process. Every step should be 
automated but the baselining and testing steps* are 
especially critical. because these steps can have 
the most deleterious impact on the normalization 
effect. 

Baselining and System Testing can be very 
labor intensive and very unpredictable if they 
are not aided by automation. As much as fifty # 
percent of an improvement effort can be spent in 
baselining and system testing; and if these steps 
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Exhibit 11. System Test Step in Conversion 
Phase 

One of the great assets of normalization is 
this ability to produce a definitive specification 
and to test against it. After a baseline is cap- 
tured, it is the specification. The improved 
system must procfuce the exact answers as the base- 
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line. When it does, the system is accepted as 
functionally equivalent. Because the V&V tech- 
nology helps automate this process, normalization 
verification can be predictably planned and man- 
aged and its cost controlled. 

The automated V&V technology is used in both 
baselining and testing. For Baselining, the V&V 
technology is used: (1) to instrument the soft- 
ware, with probes; (2) to. run test. data against 
the instrumented programs; end (3) to capture 
the execution behavior, the flow of the programs, . 
and the expected output. Exhibit 10 shows a 
step-by-step flow of the baselining process. The 
system elements, test data, execution stream, and 
expected output are all. captured and are all part 
of the baseline. This totality forms the accept- 
ance criteria for the normalized system. It is. 
used as the definitive specification in the 
System Test step to verify that the" normal i zed 
system is 'the functional equivalent of the base- 
lined system. 

For System Testing, the V&V technology is 
used as shown in Exhibit 11. The normalized 
software is instrumented and tesed under the same 
conditions as the baselined software. The system 
behavior during the test is captured and stored. 
This behavior is compared with the results pro- 
duced in the baselining step. Any discrepancies 
in the results of the System Tests and the Base- 
line Test are identified and the code is made 
functionally equivalent.. 

Each phase and step of the normalization 
process can be automated and controlled to pro- 
duce predictable results. Through automated 
normalization, software can be improved in a ' 
manageable fashion and the risk and expense of 
improving software can be tightly managed and 
controlled. 

The following section provides two examples 
of software improvement projects that used auto- 
mated normalization to modernize software. The 
first example involves using normalization to 
minimize the risk of transition in the hardware 
acquisition of a DoD Agency. The second example 
involves using normalization to move a major on- 
line capability from one computing environment 
to another. 

4.' Normalization as Part of a 
Hardware Selection Strategy 

\ 

EXAMPLE 1 . \ A DoD Agency is replacing 35 
r large mainframes\ that are part of a nationally 
distributed capability and that are interconnected 
by an internal telecommunications network and are 
. connected to a DOD^wide digital data network. 
This hardware inventory includes IBM, and HONEY- 
WELL computers and the software inventory in- 
cludes approximately eight million lines of -COBOL 
and assembler, several data base management 
systems, and several TP monitors. The Agency is 
planning to replace most of these computers in 
a single buy and to convert to a single vendor's 
DBMS and TP software package. 



The risks of converting both the hardwaire and 
software in one step was so great. the Agency wisely 
decided to minimize the software related risks by 
improving software prior to its conversion. The- 
Agency chose tc.normal ize its inventory of pro- 
grams into ANS 74 COBOL and a single DBMS, and 
operating systems, and TP monitor prior to acquis- 
ition. It used normalization to help in both 
hardware procurement and software improvement. In 
its hardqare selection, it used software and 
normalization to create a multi-target benchmark 
package that is executable across many vendor lines 
and to provide this package for the Live Test 
Demonstration (LTD) of each vendor's proposed 
hardware solutions. 

In its software improvement, the agency used 
normalization to upgrade its most critical pro- 
grams into a more portable, maintainable, and en- 
hanceable status prior to hardware implementation. 
Thus, by normalizing, the Agency met its two basic 
goals: , 

1. Minimizing transition risk by putting its most 
critical software e into a normalized posture 
prior to hardware installation, helping ensure 
the Agency of a smoother risk free transition 
period. 

2. Maximizing hardware competition by providing 
every vendor with the same normalized easily 
executable LTD benchmark package that will 
minimize the bidding vendors conversion costs 
and maximize their ability to compete for 
government business. This competition will 
reduce the Agency's life cycle costs and in- 
crease its systems quality by multiplying its 
options of. possible vendors'." .. 

Exhibit 12 shows how the Agency's benchmark 
package for the LTD was selected, normalized and ' 
verified. 
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Exhibit 12. Normalizing + Verifying 
Benchmark Package for LTD 
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0 Representative programs and transactions were 
selected from across the Agency's inventory 
of systems. These programs represent the 
types of processing the .Agency will conduct 
over the next ten years and the types of . 
. processing the hardware must support both in 
terms of the functions, volume, and mix of 
transactions. 

0 Since the Agency's processing demands are 
impacted by sudden and extreme surges in 
transaction volumes, the LTD must also model 
these heavy but sporadic transaction loads. 
The programs selected for the benchmark pack- 
age were normalized into a multi-target 
status. This normalization was verified on 
IBM in a CINCOM TOTAL MVS environment and in 
UNIVAC in a DMS 1100 environment. These are 
two radically different environments. This 
process involved normalizing: 

0 languages; 

0 interfaces; 

0 DBMS; 

0 TP; 

0 OS; 

0' hardware; and 

0 documentation. 

Automation makes the process technically 
possible. 

0 Since the normalizing process is automated, 
the verification took place in a very timely 
manner matching the writing and release of 
the hardware RFP. The benchmark package that 
was provided vendors, for the LTD included the 
normalized programs and data together with 
' detailed, specifications for the bridges. 

The multi- target normalization established 
a common baseline of programs and transactions 
that vendors benchmarked in the LTD thus providing 
the Agency with two valuable measures: 

1. The performance of the normalized software 
and 

2. *A comparison of the performance of vendor's 
' hardware using the normalized software. 

Because the code is normalized, it can also be 
used to provide performance measurement after 
installation of the new equipment. 

Normalization helped meet the goals of 
maximum competition and minimum conversion risk. 
Competition .is maximized by using portable soft- 
ware for the LTD thus minimizing . vendors conver- 
sion costs and risks. This should induce more 
vendors to compete to supply the Agency with hard- 
ware. Life cycle costs are minimized by having 
a unified software environment normalized for ease 
of maintenance and enhancement and fitted to the 
hardware so processing loads can be met. This 
should reduce the software maintenance and enhance- 
menrcosts that contribute approximately 80% of 
the Agency's ADP operational expenses. 



Normalization is a Migration Strategy 

EXAMPLE 2 . - A U.S. Government civilian 
Agency operates a nationally distributed network 
of terminals connected to a large data center. ■. 
As the initial step in its computing modernization 
strategy, the Agency normalized its software cap- 
ability and migrated from twin Honeywell 6680s to 
an IBM 3081 environment. The characteristics of 
the normalization were as follows: 

Project Initiation — April 1982 
Project Completion — January 1983 
System Implementation — 17 January 1983 



Lines of Code 

Terminals 

Transactions 

SOURCE COMPUTER 
H6680 (2) 
•68 COBOL 
TDS 
IDS 



— 200,000 COBOL 

— 550 

— 250,000 per day average 

TARGET COMPUTER . 
IBM 3081 
•74 COBOL 
CICS 
VSAM 



The normalization process moved the Agency's 
capability from the source to the target environ- 
ment, standardized the programs into ANS 74 COBOL, 
implemented information hiding of both the DBMS 
and TP functions, and documented the programs. 
Exhibit 13 shows the change brought about by 
normal ization. 



. Exhibit 13. Example of Normalization . 

« 

The Agency had three basic goeTs that had to 
be met -for a successful conversion: 

0 uninterrupted service during conversion; 
0 reliable and maintainable information 

following conversion; and 
0 conversion in a very tight time frame. 

Automated normalization was selected as the 
migration strategy because it met these goals. 

The Agency's Data Processing Center supports 
a nationally distributed network of terminals that 
is an integral part of the everyday operations of 
the total Agency and is critical to supporting 
the Agency's mission. It is. essential thaj the 
computer provide uninterrupted and responsive 
• service to the Agencies network of terminals. It 
' was mandatory that this service be provided 
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throughout the conversion process and that the 
uninterrupted responsiveness must not degrade 
during the transition period. 

The Agency gathers, stores } tracks, and 
accounts for financial data representing billions 
of dollars per year and for personnel data cover- 
ing hundreds-of-millions of people per year. 
This information must be accurate and audi table 
at all times. Therefore the software that gathers, 
processes and supports this information must be 
both reliable in its operations and maintainable 
as it is changed. It was mandatory that the 
conversion process must improve both the. reliabil- 
ity and the maintainability of the software. 

• The Agency's schedule dictated that the con- 
version be completed in a nine month time frame, 
that parallel processing be limited to less than 
30 days, and that the target environment be re- 
liable and maintainable immediately after tran- 
sition. It was mandatory that the conversion 
process be automated and meet this very tight 
schedule. 

Exhibit 13 shows the before and after' picture 
of the normalization. The system was normalized 
during the April to December period, was run in 
parallel for less than 30 days, and went live on 
January 17th 1983. The Agency met all three of 
its conversion goals. The Data Center provided 
uninterrupted service during transition, the 
software remained reliable and audi table, and the' 
conversion was completed on time meeting the 
Agency's schedule. 

5. Summary 

Software improvement is a procedure that 
preserves an ADP organization's past investments 
and sunk costs in programs and information pro- 
cessing. Software improvement differs from soft- 
ware redesign or development because it minimizes 
the risks of reprogramming by modernizing in 
incremental, testable steps. Through software 
improvement, organizations can modernize their 
information processing in a controlled manner, 
thus minimizing risks. 

Normalization is the initial step in software 
improvement; it is the standardization of existing 
software, making it more erihanceable, maintainable, 
and portable. Automated normalization is a cost- 
effective and resource-efficient technical fact, 
that lias been successfully demonstrated in several 
instances and is availablein today's marketplace. 
Automated normalization makes software improve- 
ment highly economical, feasible, and timely. In 
summary, it is a way to modernize existing systems 
while preserving past investments as much as 
possible, and putting the systems in apposition 
to take advantage of new and emerging ADP tech- 
nologies. 
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ABSTRACT 



This paper describes a CPU sizing methodology developed by the author for a corporate Performance and 
Configuration Group. The objective of this study was to effectively predict CPU utilization and total .workload 
uLfound time for future batch workloads. This was accomplished through the implementation of certain algebraic 
models which successfully model the various components (i.e., CPU, I/O, etc.) of a computer system capturing he 
dynamic interrelationship of hardware configuration, operating system logic and application workload. The result or 
this work is an algorithm which will accurately forecast average CPU utilization, volume ^-^P^^PU 
utilization, initiator turnaround and workload turnaround time for a given workload on various CPU models (3031, 
168-3, 168-3 MP, 3033, 3033 MP). From a planning viewpoint, this information is extremely important in 
determining hardware needs as application workload characteristics change. 



1. INTRODUCTION 

This paper describes a CPU sizing methodology 
developed by the author for a corporate Performance 
and Configuration Group. The objective of this study 
was to effectively predict CPU utilization and total 
workload turnaround time for future batch workloads. 
This was accomplished through the implementation of 
certain algebraic models which successfully model the 
various components (i.e., CPU, I/O, etc.) of a 
computer system, capturing the dynamic 
interrelationship of hardware configuration, operating 
system logic and application workload. The result of . 
this work is an algorithm which will accurately * j 
forecast average CPU utilization, volume independent 
CPU utilization, initiator turnaround and workload 
turnaround time for a given workload on various CPU 
models (3031, 168-3, 168-3 MP, 3033, 3033 MP). 1 
From a planning viewpoint, this information is 



Sec Appendix B for the MIPS rates for the various CPU tnodcls. 
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extremely important in determining hardware needs as 
application workload characteristics change. The 
model is formally developed in Appendix A and an 
overview and application of the model is presented in 
the following sections. 

2. Model Overview 

Input to the model consists of 'jobs* which arc 
characterized by total problem program (pp) seconds 
consumed, I/O counts by device type (e.g., 3330, 3350, 
3400, etc.,), total I/O count and initiator (priority 
group) assignment. From this input, the model • 
predicts three system variables: CPU utilization,^ 
initiator turnaround and workload turnaround. 
Considering the complexity of a computer system 
which employs hardware and software interrupting, 
priority scheduling and system resource competition 
through the multi-programming of jobs this is a non- 
trivial task. The algebraic models implemented, 
however, arc simple but effective in capturing the 
dynamics of such a computer system. For this model 
average I/O behavior is assumed (i.e., minimal I/O 
queueing) . 
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The algebraic model approach views a workload as a 
queue of jobs competing for CPU time, where CPU 
time can be spent in problem program state (pp), 
supervisor state (ss) or wait state. "The total number 
in queue is bounded by the total number of initiators 
active (multiprogramming level) and jotys arc 
prioritized in order to keep the CPU as active as 
possible. The job initiated with the highest priority 
has the best chance of securing CPU time. CPU 
control is relinquished when I/O is requested by the 
job holding the CPU or if a higher priority job causes 
an interrupt. Once the 170 request is satisfied, the job 
again competes for CPU time. VicwedMn this way, the 
amount of CPU time consumed for a job in a given 
period depends on the character of the competing jobs. 
Taking this one step further, it is possible to view this 
activity as a competition between initiator workloads 
rather than jobs. The character of an initiator is 
determined by the character of all jobs assigned to it. 
Each initiator is characterized by it's I/O rate it's 
standalone CPU rate and it's turnaround time . The 
dynamic interrelationship of competing initiators is . 
modeled through a set of simultaneous equations. 

The number of simultaneous equations to model a 
given workload is the total jiumber of initiators 
assigned to the workload. The solution of these 
equations determines which initiator drops out of the 
mix first since the "work" assigned to that initiator is 
complete. A new set of simultaneous equations for the 
remaining initiators and remaining "work" is then 
constructed. The solution to each set of simultaneous 
equations is the degraded CPU rates 2 for the active 
initiators in each time interval. Using these degraded 
CPU rates, the intervals for each initiator to complete 
their workloads are computed. The shortest interval 
identifies the initiator that drops out. The workloads 
for the other initiators are adjusted by the CPU time 
consumed by each initiator over the interval. The 
algorithm continues iteratively until all initiators 
complete their "work". Total workload turnaround 
then is the time it takes for the last initiator to 
complete its work measured from the start of the first 
initiator. CPU utilization for theinachinc under study 
is then calculated. 

3. Data Preparation 

An actual CPU sizing study was performed using 
workload information supplied by a large application 
on IBM mainframes. The workload characteristics 
(CPU problem program seconds and I/O (EXCP) 
counts by device type) were obtained using System 
Management Facility (SMF) log data collected on 



site. The objective of the study was to predict CPU 
utilization and workload turnaround for future 
projected workloads using the current batch workload 
as a baseline. Future workload characteristics were 
defined by the corporate Performance and 
Configuration Group. 

An SMF Stripper program was used to extract the job 
characteristics used for input to the modeling "program. 
The SMF stripper program stored the stripped data in 
a hierarchical database from which job and system 
summary reports were generated to help validate the 
models' predicted results. 

4. Model Validation 

For the purpose of validating the model, with respect 
to a baseline, one day was chosen from the monthly 
workload as input to tfie model program. The CPU 
utilization and workload turnaround for the test day 
was obtained via reports generated by the database 
system. Data for this day reflected heavy CPU and 
I/O activity. Selected jobs were extracted from the 
database, assigned to initiators and input to the model. 
The jobs were assigned to four initiators according to 
their I/O rate with highest priority given to I/O bound 
jobs. These assignments could have been made 
according to the actual initiators assigned to the 
baseline workload but this information was not 
available. However, the method used to assign jobs to 
various priority initiators by there 10 boundedness 
(I/O rates) is in common use by computer system 
planners in charge of job scheduling. The idea, of 
course, being to overlap I/O and CPU utilization. The 
results, actual and predicted, are shown in Table I . 

Table 1 

Validation Results 

Actual Predicted % Error 



Problem 

program (pp) 295.64 295.64 
(minutes) / 



2. Sec Appendix A for definition. 



Supervisor 
state (ss) 
(minutes) 

/ * 

CPU/ 

utilization 

(percent) 

Workload 

turnaround 

(hours) 



70.33 62.5 



55.4 



1 1.0 



51.1 



11.68 



0.0 



11.1 



7.3 



5.8 
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It is important to note that the actual workload 
contained elements which we were not able to model 
and which were irrelevant to the batch workload 
modeled. These elements (e.g., IMS start-up and I/O 
from started tasks) were minimal but do inflate the 
observed error on supervisor state which we believe to 
be in the neighborhood of 3%-4% based on previous 
validation experiences with the model. Atso, certain 
approximations to non critical operating system 
overhead in the model are "noisy". 

5. Batch Workloads 

Nine workloads were defined by the Performance and 
Configuration Group as input to the model, fable 2 
briefly describes each of these workloads. Again, the 
objective of the study was to predict CPU utilization 
and turnaround time for each workload on various 
machines and to gain insight into the behavior of the 
workloads under machine upgrades. 

Table 2 

r Batch Workloads 

o 

" Description 

WORKLOAD A anticipated workload for end of 
month processing 

WORKLOAD B anticipated workload for end of 
week processing 

WORKLOAD C anticipated workload for middle 
of week processing 

WORKLOAD Al subset of WORKLOAD A which 
could be split out and run on a 
separate processor (termed 
primary) in a dual computer 
environment \ 



WORKLOAD C2 



subset of WORKLOAD C for 
support processor 



WORKLOAD A2 



WORKLOAD Bl 



subset of WORKLOAD A 
(containing the remaining jobs of 
WORKLOAD A not included in 
WORKLOAD Al) which could 
be run on a second processor 
(termed support) in a dual 
computer environment 

subset of WORKLOAD B for 
primary processor 



WORKLOAD B2 subset of WORKLOAD B for 
support processor 

WORKLOAD C 1 subset of WORKLOAD C for 
primary processor 



Each workload was run through the model for various 
machines (3031, 168-3, 168-3 MP, 3033, 3033 MP). 
Each workload run, utilized four initiators. For the 
purpose of comparison, the model results were divided „ 
into three groups as follows: 

' Group I - consisting of workloads A, B and C 

Group II - consisting of workloads Al, Bl and CI 

Group III - consisting of workloads A2, B2 and C2 

. The results from the various model runs were plotted 
by group and are shown in Figures 1-9. These graphs 
depict the workload turnaround times, the average 
CPU utilization and the volume independent CPU 
utilization forecast for the nine workloads modeled. 
Volume independent CPU utilization represents the 
CPU utilization when all initiators are competing for 
CPU time and indicates the CPU power required if 
the volume of work offered sustains all initiators active 
over an unspecified period of time. IndividuaPinitiatof 
turnaround time were available, but were not plotted 
for this study. 

6. Discussion 

Perhaps the most interesting conclusion to be drawn 
from the model results, pertains to the character of the 
batch workload and its behavior under various CPU 
processing speeds. From the turnaround times forecast 
for the different workloads, the benefits realized by 
upgrading to a faster CPU (i.e., going from a 3031 to 
a 168-3, 3033," 3033 MP) became less and less. 
Looking at Figures 1, 4, 7 (Workload Turnaround) 
one can observe that the workload turnaround times 
predicted for a 3033 and 3033 MP are very close. The 
^ reason for this phenomenon is that the modeled 
■< workloads are becoming constrained by their level of 
' I/O activity and the number of initiators assigned to 
, the mix. 

Another characteristic of this phenomenon is that in 
. addition to the number of initiators assigned to the 
mix, the effectiveness of a particular assignment of 
jobs to initiator is not invariant under an increase in 
the speed of the CPU. For example, in comparing the 
workload turnaround time for workload A (Figure.l) 
with a subset of itself, workload A2 (Figure 7), the 
following facts are noted, 
i. the assignment of jobs to initiators (the initiator 
structure) is'different in Figure 1 than in Figure 
7 but within eachjigure across machines the 
initiator structure/is constant. 
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ii.\ on the 3031 theUurnaround time for workload A 
is 48.5 hours and that for workload A 2 is 44.6 
hours. On the 3033 MP the turnaround time for 
workload A\s 13.8 hours while that of workload 
A 2 is 15.2 hours. 

These facts suggest that as a workload becomes more 
I/O bound (i.e. faster CPU) the initiator structure 
may become a more predominant factor in turnaround 
time. The effect of initiator structure-on'turnaround 
time as a function of machine speed is being further 
investigated. 

The above issues suggect that to insure batch workload 
turnaround improvement when upgrading to a faster. 
CPU that the initiator structure should be studied. It 
may be-«ec§ssary to change the number of initiators as 
well *as the method-of assignment of the jobs to" the 
initiators. The optimum results will be a function 
dependency relationships between jobs in the workload. 
Further, as mentioned earlier, average I/O behavior 
was assumed throughout this study. Upgrading to a ..." 
faste'r CPU will speed up the delivery rate of I/O 
requests to the I/O subsystems. This may cause I/O 
bottlenecks in the actual system. Such a situation may 
be relieved by a better distribution of data balancing 
across channels, controllers, and devices or by purchase 
of additional peripheral hardware. 

• 7. Summary 

An algebraic model for a multiprogrammed computer 
system has been developed (Appendix A) and an 
application of the model to a problem in CPU sizing 
described. A more extensive algebraic model 
incorporating ot^cr sources of contention (I/O v 
subsystem, memory,.et£.) has been developed and will 
appear in a forthcoming paper. Algebraic models 
other than simultaneous linear equations arc available. 
1 would like to take this opportunity to acknowledge 
many early stimulating discussions with H. Pat Artis 
on dynamic job scheduling, which motivated this 
research on algebraic models of computer systems. 
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APPENDIX A 

An Algebraic Model for CPU Sizing 8 
A.l Introduction 

A new methodology for modeling computer 

systems has been developed using sequences of 

algebraic models. The approach can be used in 

conjunction with dynamic mix analysis techniques on - 

workload clusters^ ' and is similar in flavor to 

[2l 

operational models of computer systems . The 
particular algebraic model used in this pa per for CPU * 
sizing will be introduced as a set of simultaneous 
equations* The modeling approach taken here consists 
of the following mainpoints: 

a. The conversion of application^ workload into 
workload on the various components of the 
system. 

b. The rates at" which this work is carried out 
can be represented by the utilizations of 
the components involved (CPU, channels, 
devices, etc.). 

c. By studying the logic of the operating 
j system, critical sequences of highly 

repetitive operating system module activity 
can be isolated. These arc primarily 
sequences necessary to support the read ' 
and write activity of an application 
program and include as elements such 
modules as the I/O supervisor, interrupt 
' ' handler, and dispatcher. 

This information, together with certain 
application program parameters, is sufficient to 
determine the induced work and work rates 
(utilizations) on all processes classically of interest in 
the system. For purpose of this paper attention is 
restricted to CPU utilization only. 
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In this^moderinTassumed that no I/O 
contention (queueing) is occurring and that all I/O 
operations take the "average" time to complete. A 
parameter to account for deviations from this 
assumption is, built into the model for potential use. It 
is also assumed that paging (or swapping) is not a 
significant problem and in a similar manner as I/O 
degradation, a parameter to account for deviations 
from this assumption can be built into the model. 
Submodels can then be used to estimate these 
parameters. This expanded approach has been used 
; and benchmark validated in a more complete algebraic 
mode! of a computer system than outlined in this 
paper. 

We also assume there are N jobs 
competing for the central processing unit (CPU) under 
a preemptive priority dispatching discipline for service. 
An application program in control of the CPU can be 
interrupted and lose control of the CPU if either a 
program of higher priority preempts or the program 
itself issues a request for an I/O operation. 

We are interested in isolating certain key 
parameters of application programs which characterize 
the logical interaction between program and operating 
system modules. These parameters, ideally, will be 
invariant under multiprogramming since they will 
represent interaction with the operating system, which 
must be carried out regardless of the collection of 
programs with which a given program is executed. 
Total number of application program CPU seconds 
consumed and the number of l/0*s issued per 
application program CPU second arc examples of such 
invariant parameters. As we will see, th»> latter will 
. allow us to calculate the CPU time expended by the 
operating system to service the application program 
requests for I/O operations as a function of the 
multiprogrammed mix. 



The following is a list of operational 
definitions necessary to describe the quantities used in 
the model. For a given application program, Pj,.the 
following standalone attributes arc defined. 

C. — total application problem program CPU time 
for program i 

Tj - total single thread elapsed time of program i 

Ef - total number of tape I/O's issued by program i 

E d - total number of disk I/O's issued by program i 

ei - (Ef + Ei d )/Ci is called the I/O rate of program 

f r . - Cj/Ti is called the CPU rate of program i 

The following multiprogrammed attributes are also 
defined. 

f i - total elapsed time of program i when 
multiprogrammed 

f. - Cj/Ti is- called the degraded CPU rate for 
program i 

Atf ■ average time to do a tape I/O 

Atf - average'time to do a disk I/O 

Certain quantities related to the operating systcm^will 
also be referenced as follows. 

0 1 - sum of CPU timings through all operating 
system modules invoked to support one I/O. 
The sequence of operating system modules 
invoked is called the I/O critical path. 

N — number of programs running in the 

multiprogrammed job mix (i.e., number of 
initiators active). 
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0(N) - CPU rate of all operating system module 

activity not on the I/O critical path. This is a 
function of N and Hhe multiprogrammed mix 
characteristics. 

A.2 Model Formulation 

A.2.1 Operating System If we assume the operating 
system modules run in supervisor state disabled for 
interrupts, then the operating system modules once 
they gain control of the CPU cannot be preempted by 
an application program. Therefore, operating system 
modules on the I/O critical path invoked by the 
process of issuing an I/O request, take on the priority 
of the invoking; programs in order to gain control of 
the CPU, but once in control, they behave as the 
highest priority jobs in the multiprogrammed mix. 

Since an operating system module is itself 
a program, the CPU rate of the module is proportional 
to the CPU rates and I/O rates of the programs 
invoking it. Once activated, the module will degrade 
all application program CPU rates with the exception 
of the invoking program. All operating system module 
activity not included in the I/O critical path is lumped 
into one pseudo-operating system module which is 
considered to run at highest priority with CPU rate 
0(N). 

The CPU rate of that portion of the 
operating system representing modules on the I/O 
critical path is given by 

I - S h q *' 

i-l 

where 

?{ — . CPU seconds/elapsed second 
ej - I/OVCPU second 



6' - CPU seconds/ I/O 

Note that f\ is the actual degraded CPU 
rate of program i which we will ultimately solve for. 

If S is the total operating system CPU rate 

then 

S-fl(N) +1 

In order to fully characterize the total 
operating system CFU rate, it is necessary to 
determine 0(N). Total CPU rate of all I/O critical 
operating system modules is obtainable as well as total 
operating system CPU rate. Hence, we may assume 
both S and I are known quantitatively. Unpublished 
experimental results with operating systems have 
indicated that d(N) varies linearly with increasing 

multiprogramming level. We therefore assume 

7 

0(N) - k(N)S/ 

where k(N) is a linear function depending on the 
number of programs in the mix. Since S is 0(N) + I, 

fl(N) -k(N)tt(N) + 1) 

"\ 

or \ 
0(N) -I(k(N)/(l-k(N)) 

0(N) is now seen to be a function of multiprogram 
depth and though the expression I also a function of 
the I/O characteristics of the programs being 
multiprogrammed. The expression k(N)/(l-k(N)) will 
be referred to as simply the term K, the dependence on 
N being assumed. k(N) can be empirically 
determined given 0') t total number of I/O's in a mix, 
multiprogramming level, S and the discussion in this 
section. 

/ 

/ 
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A. 2. 2 Application Programs Next wc turn our 
attention to determining the degraded CPU rates r t of 
each program in the mix in order to determine total 
CPU utilization required and the turnaround time for 
the mix. 

Essential to the modeling technique 
developed in this paper is a three level hierarchical 
view of time. Given any process, one considers first 
the passage of elapsed (clock on the wall) time. 
Within an interval of such time, certain subintervals 
may be made available within which a process can 
become active. That is, the process may become active 
for none, some, or all of the subinterval. Hence, one 
can distirguish three types of time: elapsed time, 
available time, and process active time. This situation 
is indicated in Figure A- 1. 



THREE LEVEL VIEW OF TIME 



\ v,4m>>A L/v^J Ij^/J 



, PROCESS ACTIVE 
TIME 



AVAILABLE TIME FOR 
PROCESS ACTIVITY 



MM j 



□ 

■LAMED TIME 
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Figure A-l 



Note that if there are several processes 
competing for available time within an interval of 
elapsed time, then what is nonavailable time for one 
process may be available time for competing processes. 
This is true for example when programs compete for 
the elapsed time of a CPU. On the top line of Figure 

*"5Cl7the dotted regions represent time spent on 
competing processes and the hashed region time that 
was available to a process but not used, Le.Nhe 
process was not active. The darkened regions indicate 
process active time. There are therefore, two possible 
ways jo represent a process utilization; with respect to 
elapsed time and with respect to available time. This 
distinction is of fundamental importance in the solution 
of many computer system problems since any attempt 

. to increase a process utilization beyond the maximum 
value indicated when utilization is computed with 
respect to available time must be achieved at the 
expense of competing process utilization. 

If program execution in a 
multiprogrammed computer system is viewed as a 
process competing for available CPU time, then the 
degraded CPU rate of a program, as the model will 
calculate it,, is equivalent to process utilization with 
respect to elapsed time and not available time. Its 
single thread CPU rate is equivalent to process 
utilization with respect to available time. For the 
single thread (standalone) case, available time is 
precisely elapsed time. The degraded CPU rate for 
program i, rj, is unknown since Tj, the degraded 
elapsed time, is not known. However, v\ may be 
represented as an analytic expression derived as 



follows. 

Let us assume that program i has higher 
priority than program i + 1 and that the operating 



122 



125 



system modules, once they have gained control of the 
CPU, cannot be preempted. Further, suppose a unit of 
available CPU time is made accessible to programs in 
a multiprogrammed mix. We would like to 
characterize the "average 11 competition for the unit of 
available CPU time by the programs and system 
modules constituting the mix. In other words, given a 
unit of CPU time (e.g., a CPU second), it is of interest 
to determine how the time is expended on the various ' 
processes competing for the CPU resource. 

Consider an available CPU second. Since 
the operating system has top priority, 0(N) + I is the 
average fraction of the second taken by the operating 
system. The CPU time remaining for problem 
program state activity is then 1 — (0(N) + I) seconds. 
For a particular program, the expression representing 
operating system overhead time unavailable for 
problem program activity must be modified slightly. A 
program will never be degraded by its own I/O 
operating system overhead since it is assumed that it is 
no longer competing for the CPU resource until the 
I/O request is satisfied. Hence, define 

Ij - I - ?iC/ 

so that Ij, represents all critical operating system 
module CPU rate except that induced by program i. 

Assuming that no I/O contention is 
present, the top priority application program will take, 
on the average, a fraction n of the available CPU time 
or 

f, - r,(l - (0(N) +1,)) 

of the initial available CPU second. The next highest 
' priority program will take on the average a* fraction r 2 
of the remaining CPU time or 



r 2 -r 2 (l - (0(N) + I 2 + f ,)) 

of the initial available CPU second. 

In general, the ith program experiencing no 
I/O contention will take on the average a fraction r { , of 
the remaining available CPU time or 

n - - (*(n) + ii + 2 fp> 

j<i 

of the initial available CPU second. Figure A-2 
indicates the initial available CPU second and the 
corresponding time lines for the equations given above. 

If it is assumed that I/O contention is 
present, then the above equations can be modified so 
that, 

?i - dft(l - 0(N) + 1,4 2 f j» (A-l) 

\<x j 

/ 

i- 1,2,...N 

where dj is a parameter reflecting degradation due to 
I/O contention and 9 <. d; <. 1 . dj may itself be 
represented as an algebraic model or as the output of a 
simulation or queueing model. 

Ti may be approximated as follows, 

n - q/(q + (EfAtf + ejV) (a-2) 

+ d'(El + E?)) 

or obtained by actually running the job on the 
computer system. 



Equation (A-l) represents a set of N simultaneous 
equations in the N unknowns, fj. Solving this set of 
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equations for the f |t wilh dj - 1 for all i, allows us to 
ihem compute the following functions of the degraded 
rates. 



?i + (K + l)e,0'r| 
PP SS 



Turnaround (Service. Response) Time for Program i 



t,-s. 



APPENDIX B 



Supervisor State (SS) CPU Utilization 



S -0(N) + I 



2 <K ■+ i)e i e ' ? i 

i-l 



MIPS RATES 

3031 1.2 

168-3 2.7 

168-3MP/AP 4.5 

3033 4.6 

3033 MP/AP 7.6 



Problem Program (PP) CPU Utilization 



N 

p- 

i-l 



Total CPU Utilization " 
U -S + P 



- 2 (l + (K + 

i-l 



Contribution of each Program to Total CPU 
Utilization 
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Figure (A-2) 
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Average CPU Utilization (%) ~ Group I 
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This paper will discuss the concept of software engineering as provided by 
Dr. Barry Boehm as "the application of science and mathematics by which 
the capabilities of computer equipment are made useful to man via computer 
programs, procedures, and associated documentation. 11 It can serve as a 
starting point for developing and institutionalizing a modern Software 
Engineering Technology (SET). The document defines key elements which 
comprise a SET and suggests a method for formulation of these elements 
into a technology that encompasses all of the primary stages of the 
software life cycle. It presents a discussion of the types modern day 
software engineering practices. The approach emphasizes the incremental 
integration of software tools into the technology as a means of increasing 
productivity, establishing regularity and uniformity, and improving 
control over software systems. 

1! The past is but., the beginning of a beginning, and all that is and has 



ine past ? s ou^ trie uegiimmg ui a ucj 
been is out the twilight of the dawn." 



H.G. Wells 
The Discovery of the Future (1901) 

Key Words: Software Engineering Technology (SET); Software Engineering; 
software tools ; software management . 



1.1 Software Engineering 

Software engineering encompasses^a wide range of 
techniques and methods for managing, developing, 
and maintaining computer software. It is 
sometimes thought of in the more restrictive 
sense to cover only programming methodology. 
Software engineering actually covers a much 
broader scope and includes all of the 
disciplines which are used in dealing with 
software throughout its life cycle. 

A more formal definition of software engineering 
is provided by Dr. Barry Boehm as "the 



application of science and mathematics by which 
the capabilities of computer equipment are made 
useful to man via computer programs, procedures, 
and associated documentation" [1]. Software 
engineering is sometimes referred to as the 
discipline which brings order to the software 
life cycle development process. There are many 
techniques such as top down design, structured 
programming, thread testing, HIP0 charts „ etc., 
all of which bring a form of discipline to the 
development and management of software. 
Defining the techniques employed by an 
organization through the use of standards and 
procedures is a primary means of establishing 
order in software development and management. 



ERIC 



135 



138 



Along with a discipline come the measurements 
and controls which permit software 
evaluated and subsequently managed . / Activities 
such as critical design reviews, program design 
reviews, and software change controls usually 
accompany software engineering methodologies . 
These functions are necessary in J° 
determine if the software meets the need s^ of the 
end user and is easy to maintain A Software 
Engineering Technology (SET) includes the 
development of software standards, development 
of computer and manual procedures "g^^ 
' of the necessary software tools, and integration 
of these elements into a suitable be 
environment. The aim of the SET is that 
a practitioner's approach to software 
engineering [2]. In doing so the technology 
must be flexible, yet should not sacrifice 
principles. A comprehensive SET for software 
development must consider all aspects of the 
software life cycle if desired results are to be 
obtained. For this to be possible, the factors 
which affect the software . life cycle phases must 
be identified and a formal technology 
established . 



1.2 Components of a SET 



A SET consists of two major components - stages 
and elements. Stages are discrete phases of the 
software £&e cycle identified by the type of 
activities associated within the stage. The six 
stages of the software life cycle for a SET are- 



intersections of elements within a stage, ■ 
represent the theoretical details of the 
technology. After the SET is developed, these 
blocks should contain the detailed document ation 
for the standards, procedures , tools , quality 
assurance, and training for each stage The 
lines dissecting the stages and elements provide 
a natural boundry for the SET components. These 
divisional boundaries are logical control points 
in the technology from which reviews can be 
performed, quality ascertained, errors 
identified and corrected, and management 
decisions made. Also, the vertical lines 
dividing the stages are logical points for 
products, deliverables or milestones to occur. 
In addition, the system environment (hardware, 
methods of operation, etc.) provide a third 
unillustrated, dimension to the matrix. As an 
organization manages or changes its data 
processing environment, the matrix may have to 
be modified accordingly as additional tools, 
procedures, and standards enter the environment. 

1,3 SET iterative Review Process 

Because of the numerous directions software can 
take, software control becomes a process of 
iterative reviews. All components within the 
software life cycle need to be checked and 
rechecked to ensure that none of the meaning or 
usefulness desired from the software is lost. 
One method, as reflected in this SET, is to 
continually test the Quality of products from 
each stage in the software life cycle. 



requirements definition and analysis; 
design; 
programming; 
. validation; 
operation ; and 
review. 

Elements of a SET are the principle factors 
which direct and control the software activities 
within each software stage. The five elements 
for a SET are- 
standards; 
procedures; 
tools; 

quality assurance; and 
. training. 

The combination of these five elements within a 
particular environment over the six stages of 
the software life cycle constitute a SET. This 
SET can be built in such a way that resources of 
varied quality can be used to accomplish similar 
engineering tasks. Without any of the five 
elements, an engineering project will not 
produce a technology, but rather an unused set 
of tools, standards, and procedures [3]. 

The association of the five different elements 
with the six stages is viewed as a two 
dimensional matrix as illustrated in Figure 1. 
The blocks of the matrix, which are the 



1.4 Key Concepts of a SET 



The foundation for a SET consists of the 
objectives to be achieved from a technology. 
The building blocks for meeting these objectives 
are the standards, procedures, tools, quality 
assurance, and training elements available in an 
appropriate environment. For the technology to 
be effective, these elements must be 

interrelated so that each element compliments 

the other. 

Two of the most important concepts for making 
effective use of any SET are the controls 
administered and management support for the SET 
program. Management must be willing to allocate 
a sufficient, but reasonable, amount of money, 
personnel, time, and other resources to the SET 
program; not only to develop it, but also to 
maintain and control it. 



1.5 SET Objectives 

No organization or technology would have 
justification for existing unless identified 
•benefits were . realized . This is particularly 
true for a software engineering technology. 
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' Thus, it is paramount that defined objectives be 
in place so there is a common goal upon which a- 



SET can be built, 
are to- 



Primary objectives of the SET 



reduce overall cost of software by 
reducing the effort spent on 
maintenance through better planning, 
design, testing, and control of 
software resources; 

improve software reliability by 
increasing the degree of software 
correctness, minimizing uncertainty, 
maximizing data integrity, and 
improving system security; 

improve maintainability of software by 
making software structure consistent, 
isolating functions, having up-to-date 
documentation, facilitating program 
understanding, standardizing 
interfaces, managing mechanisms to 
control and apply changes or 
enhancements quickly, etc.; 

improve portability by isolating 
architecture dependencies, reducing or 
eliminating operator intervention, 
eliminating nonstandard source code, 
using high-level computer languages, 
and reusing source code (i.e. reducing 
development redundancy); 

ensure software is useable by makingN 
sure it meets requirements and i 
maximizes error detection in the early x 
stages of software development; 

obtain regularity and uniformity in 
such areas as program format, naming 
conventions, programming standards, 
system design methodology, data 
isolation, maintenance change 
methodology, function 
interchangeability, single-source 
maintenance, and test and acceptance 
criteria; 

. establish controls and measurement of 
software for project monitoring, 
evaluation of software, management 
decisions, tracking, auditing, etc.; 

improve organization productivity 
through increased programmer 
productivity, increased hardware 
performance, reduced learning curves, 
increased programmer effectiveness, and 
broader skills at both technical and 
management levels; and 

improve product quality and 
responsiveness to user requirements. 



.6 Environment's Effect on the SET 



There are certain factors which have an effect 
or influence upon the development and 
implementation of a SET. This common influence 
on the SET is referred to as the environment. 
The environment is the aggregate of 
organizational, technical, and managerial 
conditions of the' ADP activity, which make the 
definition of each SET unique. Thus, the effect 
the environment has upon the SET is different 
for each organization. Add to this the fact 
that an organization's environment is constan tly 
changing with time. Therefore/ the SET must be 
flexible and conducive to changes in the 
environment . 



2. Overview of Software Life Cycle Stages 

The software life cycle includes the six stages 
of- 1 

requirements definition and analysis; 

design; 

programming; 

validation; 

operation; and 

review. 

2.1 Requirements Definition and Analysis Stage 

The Requirements Definition and Analysis Stage 
begins after a feasibility study has recommended 
"that a new or modified system be developed for 
computer processing, and that , recommendation has 
been^accepted by management. The purpose of 
this stage is to define specific functional 
capabilities required for the system, define 
performance requirements that must be met , and 
identify all information that the system will 
use or produce. The results of this stage 
become input to the next, stage, which is the 
design of the new or modified system. To avoid 
limiting the system designer's ability to 
consider a variety of design options, 
requirements defined by this stage must 
concentrate on specifying what the system is to 
do — not how to do it. 

Analysis includes the study of abusiness area 
or application, leading to the specification of 
a new or modified system. It consists of 
interviewing the user about what the current 
system does or should do, what extra features 
are desired in the new system, and what 
constraints should be placed on the new system. 
There are repeated interactions with the user m 
order to reach a clear understanding of new 
system requirements. The most important product 
of system analysis is the functional 
specification. The user must be given adequate 
time to review the functional specification 
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before it is passed to the system designer, as 
this can alleviate problems later. If care is 
taken to ensure that all of the data and 
processing requirements have been identified, 
then required user changes can be .kept to a 
-minium. However, even though care is taken to 
review and finalize requirements with the user, 
the definition of requirements is an interactive 
process throughout the system life cycle, with 
specification changes identified during later 
life cycle stages. 



2.2 Design Stage 



Design includes the tasks of detailed 
specification of software, data, hardware and 
processing requirements, and the segmentation of 
processes into programs. The design process 
involves moving through successive levels urtil 
the system has been defined sufficiently for 
software development to begin. This 
decomposition facilitates human comprehension by 
breaking down a large, complex problem into 
smaller, more manageable pieces. 

In addition to moving through successive levels 
of conceptualization in decomposition of the 
problem, there is also some iteration of the 
steps in the design process. Most often the 
initial solution to the problem is not the best; 
and as the designer moves from one level of 
conceptualization to the next, more insight is 
gained into the ramifications of the problem and 
design refinements may then be made. 

A system can be viewed as a group of data 
processing entities. The designer describes 
these entities in terms of their input/output 
and processing performed to accomplish the 
transformation from input to output. Interfaces 
and control sequences of the data processing 
entities are described along with descriptions 
of the data that flows between them. This forms 
the "logical" structure of- the system. 



2.3 Programming Stage 



Programming includes detailed design of the 
processing logic, preparation of a plan for 
testing the program, development of test data, 
code production, unit testing the code, and 
documentation. This stage begins with the 
receipt of a specification for the development 
of a new program or the modification of an 
existing program. The programmer's first act is 
to obtain a good definition of the problem to be 
solved. He analyzes requirements, asks 
questions if clarification is needed, and 
recommends modifications to the program 
specification if it contains errors, omissions, 
or obvious inconsistencies. 

Once the problem is thoroughly defined, detailed 
program design begins. The program design is 



complete when the program specification, 
describing program construction, is finished. 
To predict whether the designed program will 
function as intended, designs should be verified 
by someone other than the designer as the 
designer cannot always objectively evaluate his 
own work. 

After program design, the programmer codes the 
program. In some instances design may not be 
completely finished before coding begins. 
Modern programming techniques such as 
modularization and top down development, permit 
flexibility in scheduling program development 
activities by separating the program into 
logically distinct portions which can be 
developed and tested independently. As a 
result, there is a great deal of overlap of 
design, coding, testing, and documentation 
.activity throughout the Programming stage. 

Unit testing follows coding. This purpose of 
unit testing is to ensure that the unit, or 
program, released for system testing is free 
from internal logic or format errors and 
conforms to its specifications. 



2.4 Validation Stage 



Validation encompasses system testing followed 
by acceptance testing. The purpose of system 
testing (sometimes referred to as integration 
testing) is not to retest all oi the detailed 
functions within each program si^ce that would 
unnecessarily duplicate unit testing. Instead > 
its purpose is to connect the program units to 
determine whether they function together in 
tandem. Thus, the main testing emphasis is on 
the interaction between and interoperability of 
software components and their interfaces. 

Planning for system testing begins. in the 
earlier stages of the software life cycle. The 
system should be tested against functional 
specifications produced during the Requirements 
Definition and Analysis stage. For this reason, 
adequate specifications for quality must be 
included and the requirements stated in a way 
that can be tested. During the Design stage, 
when components of the system are first 
identified, the system test plan is produced as 
part of the system specification. The system 
test plan defines how modules or programs of the 
system are to be sequenced and pieced together. 
It defines the order of integration, the 
functional capability of each version of the 
system, and the responsibilities for producing 
code that simulates the functions of nonexistent 
components. Development of test cases for 
system testing is accomplished during any or all 
the Design, Programming, and/or Validation 
stages. System test cases consist of test data 
and scenarios supplied by the user, programmer, 
and/or system testing component. 
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2.5 Operation Stage 

The Operation stage begins after a software 
product has been validated to the satisfaction 
of the user atjld has been accepted for production 
processing. This applies to both hew 
development efforts and enhancements to existing 
systems. The operation stage includes the 
executon of software and the interface between 
development and operational components necessary 
to ensure proper execution. 

2.6 Review Stage 



Methodology standards are uniform practices. 
Performance standards are metrics to evaluate 
performance. Methodology standards. are rules of 
procedures or instructions on "how- to-do- it . 
Performance standards specify how well a 
function should be be performed by people, 
software, and machines. For example, 
programming methodology standards would indicate 
how a program is to be coded (i.e., what coding 
form to use, how characters should be 
hand-written for input, the format of the source 
code listing, etc.). The programming 
performance standards would state how long 
program coding should take, given the experience 
of the programmer, complexity of the job, and 
other relevant factors or constraints. 



The Review stage begins after software is 
operational and basically in a maintenance mode . 
The purpose of the review stage is to provide 
for periodic performance evaluations of 
operational software systems. The evaluation 
must address performance characteristics from 
the perspective of the user, development, and 
operations components with a view toward 
alleviating errors and recommending 
improvements. Review can be initiated based 
upon management direction, problems, software 
age, or scientific selection. Scientific 
selection is the preferable method since it is 
the least subjective. The review itself is most 
effective as an internal peer group activity 
conducted at the project level. Activities 
associated with this stage include- 

qualitative and quantitative data 
gathering; 

data evaluation; and 
trend analysis. 



3. Overview of the Five Elements 



Establishment of a SET in any ADP environment 
consists of the development of a detailed plan 
that describes and integrates the five major 
elements of- 

stand ards ; 
procedures; 
tools; 

quality assurance; and 
training. 

This combination of standards and guidelines, 
procedures, tools, quality assurance and training 
forms the basis for a software technology 13 \. 

3. 1 Standards 

The term "standard" can be defined as that which 
is established by authority, custom, or general 
consent as a basis for comparison. Data 
processing standards can be grouped into two 
major categories: methodology standards and 
performance standards. 



3.2 Procedures 

Procedures are methods of doing business within 
an organization. They define processes which 
are followed on a project or the manner of 
proceeding with a task. For a SET, procedures 
are really the "what-to-do" and "when" in 
performing software tasks. 

There are several types of. procedures particular 
to a SET. There are undocumented and documented 
procedures, as well as manual and automated 
procedures. Undocumented procedures are the 
methods or processes which are performed based 
upon tradition. Normally these procedures are 
communicated from person to person by 
word-of-mouth. Approval chains or signoff 
procedures are quite often examples of this type 
of procedure. 

Documented procedures are formal written 
procedures in a data processing shop such as 
software run instructions, test case development 
steps, software testing process, etc. They 
serve as a source of reference to personnel for 
maintaining consistency and uniformity in 
completing software development tasks. 
Documented procedures eliminate much of the 
trial-and-error in determining "what-to-do and 
"when-to-do-it." Also, documented procedures are 
an invaluable source of training material when 
integrating new personnel into the organization. 
All procedures used in the software life cycle 
should be documented including those that were 
normally performed only by tradition. If not 
documented, the procedures will never become 
public knowledge or practice, and will not be 
adopted and accepted by the data processing 
personnel . 

Manual procedures involve activities such as: 
desk checking code, obtaining management 
approval for proposed system enhancements, and 
interfacing with users to identify system 
requirements. Automated procedures involve the 
use of software tools, such as: language 
analyzers, code optimizers, requirement 
analyzers, and programming support tools. 
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3.3 Tools ft . . - 

M A software tool is a computer program that can 
automate some of the labor involved in the 
management, design, coding, testing, inspection, 
or maintanance of other programs [4]." N 

There are many types of tools in widespread use 
throughout the ADP industry. Their use has 
become almost essential to the effective use of 
computers for any application. These tools 
range in size and complexity from simple aids 
for individual programmers to complex tools that 
can support many software projects at the same 
time. 

Tools are important because they can be used to 
produce software faster, more accurately, "and 
more uniformly, while significantly improving 
personnel productivity. As a result, their use 
can become an important part of software 
development. More importantly, tools represent 
a class of software that can be used and reused 
within many different) environments. Thus, the 
use of tools provides the opportunity to reduce 
costs and improve productivity while decreasing 
development time. / 



3.4 Quality Assurance 



Quality Assurance /(QA) is the formal process of 
measuring or evaluating, the degree to which 
software meets standards (e.g., alignment of 
code or percentage of logic executed during a 
test), and/or prescribed requirements (e.g., in 
the areas of accuracy, reliability, etc.). The 
primary purpose of QA is to develop and maintain 
better software products than those which would 
otherwise have been developed and maintained 
using traditional methodologies without 
measurements or controls. 

3.5 Training 

The purpose of training is to create, through 
some type of learning experience, a permanent 
change in a person's behavior so that the 
individual reliably performs in a certain 
prescribed manner. The types and quantity of 
training required for a SET depend heavily upon 
an organization's personnel background and the 
make-up of the SET itself. 

Webster defines training as "a process or method 
to lead or direct growth; to form by 
instruction, discipline, or drill" [5]. 
Training implies a method (procedure), change 
(behavior modification), and result (growth or 
performance). Training in terms of the SET 
requires the identification of target training 
areas and specific plans of action to bring 
personnel to a higher level of knowledge and 



performance. Since the establishment, 
institutionalization, and implementation of SET 
is evolutionary, the training plan should be 
structured to facilitate this process and - 
provide training which closely corresponds to 
the SET. 



4. Planning the Development of a SET 



4.1 SET Planning 

Developing a comprehensive SET requires detailed 
planning, organizing, personnel coordination 
and, most importantly, management involvement 
and support. If the SET development effort is 
visible in an organization, the chances that a 
formal SET will be implemented increase 
substantially. Establishing such an effort as 
a planned program in an organization's plan is 
the first step in ensuring that the process will 
receive the necessary attention. Involvement by 
all parts of the organization in the development 
of a SET will increase the potential for 
successful implementation. 

Two approaches in planning the development of a 
SET can be used - a top down approach and a 
bottom up approach. A top down approach 
develops a theoretical structure of software 
engineering, and expands this to successive 
levels of detail until all tasks are completed. 
The bottom up approach identifies the current 
engine, vxng process, and from this identifies 
which elements of the technology are missing. 
An evol- tionary process of substituting tools 
for existing processes, developing missing 
procedures, standards, etc., by trial, analysis 
and modification, will evolve the current system 
towards the desired SET. 

It is im^jrtant that the SET development plan 
yrovz.de a basic framework, direction, and 
overall schedule for the project. The plan 
should he tructured so that it can be updated 
on a pev >uic basis and should address all five 
elesaer* of* 

standards ; 
procedures 
tools; 

quality assurance; and 
training . 

An organizational planning process involves both 
tactical (short term) and strategic (long term) 
objectives. These objectives are usually 
identified in an organizational plan as a ranked 
group of milestones* The ranking process is 
conducted by management whereby each milestone 
is evaluated using criteria which mirror the 
goals of an organization. Top level management 
usually establish fchese criteria which are then 
used to prioritize and rank organizational 
objectives. The' establishment of a SET requires 
detail knowledge of an organizational plan to 
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establish standards, buy software tools, develop 
procedures, and plan an adequate computing 
facility which best support organizational 

objectives. The SET Development Plan must 
strike a balance between supporting 
organizational objectives and how rapidly the 
engineering technology is implemented. As part 
of the plan, an analysis of the current 
engineering process should be conducted and 
should be organized so that it can easily be 
compared to the new, planned process. This 
comparison will help identify actions needed to 
upgrade the process. The analysis should also 
identify high pay-off areas as this will help 
establish, priorities . From this analysis will 
emerge the long range strategy and a detailed 
plan [3], Figure 2 shows a possible outline for 
a SET Development Plan. 

4.2 Model for Developing a SET 

The method used for establishing a SET depends 
on a particular organization and its staff. 
There are, however, some principles which are 
characteristic of any model for developing a 
SET. A SET should be flexible in order to 
maintain an up-to-date technology. There should 
be periodic* checkpoints to enable personnel to 
be brought in tune with the established 
technology. It should allow room for 
experimentation with modern software practices. 
Finally, it should allow integration of software 
tools to automate manual software activities 
where possible . . 

It is good practice to establish periodic 
baselines during the SET development process 
because developing a technology is generally 
disruptive to an organization. It frequently 
changes the way\ people produce and maintain 
software. It requires training and makes old 
procedures obsolete while requiring that new 
ones be adopted, to minimize this disruption, 
there should be planned periods during the SET 
development process where technology is 
stabilized so that personnel can upgrade their 
skills and put into use the software engineering 
principles formulated thus far. 

Figure 3 presents & siaplisied model for 
developing a SET. The targ^S; SET developed from 
application of the modei.^uet exist- within the 
boundaries of a particular environment. 
Environment represents a mu'i ti-dimensional 
concept that identifies the many components of 
an organization «nd thftit relationships. Some 
important attribtit?* fit *a ADP environment 
include- 

. size of the software organization & 
personnel mix; 

• . type of organization (private, 
government, etc.); 

. applications (scientific, MIS) and 
language(s) ; 



development environment (batch, 
interactive) ; 

program running environment (batch, 
real-time, etc .) ; 

computer type; and 

. involvement in tools develoment [4). 

These attributes should be evaluated and 
quantified in order to compile an organizational 
profile. This profile, coupled with key SET 
elements will serve to establish a custom fitted 
SET baseline. Establishing a baseline SET will 
require a proper environmental mix of SET 
elements within the software life cycle stages 
that are\balanced against an organization's 
need 8 and requirements . 

4„3 Identifying Standards, Procedures and 

Quality Assurance t 



Identifying standards, procedures and quality 
assurance for a SET is a large and sometimes 
confusing task. First there are questions such 
as what software practices should be 
standardized and, what processes should be 
developed? Each organization developing a SET 
must eventually answer these questions. 
Although answers to the questions will be 
different, the items to be considered can be 
grouped into categories. Only then can the 
translation of these categories into 
preliminary, and ultimately, fully documented 
standards, procedures and quality assurance 
methods that fit into an organization's 
environment be performed. 



4.4 Standards Considerations 

One important area of consideration regarding 
standards is that conformance with Federal, 
information Processing Standards (FIPS) (Ls a _ 
regulation for all Federal agencies. However it 
should be noted that there are waiver procedures 
for most of the FIPS. The FIPS Publication 
Series of the National Bureau of Standards, U.S. 
Department of Commerce [6], is the official 
publication relating to standards adopted and 
promulgated under the provisions of Public Uw 
89-306 (Brooks Act) and under Part 6 of Title 
15, Code of Federal Regulations. These 
publications consist of guidelines and mandatory 
standards for the utilization and management of 
computers and automatic data processing systems 
used in the Federal Government. 

4.5 Integration of Software tools 
The introduction of' a tools oriented SET into a 
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1. PROJECT INITIATION \ 

1.1 Purpose of SET Project \ 

1.2 Objective and Goals of SET \ 

1.3 SET Development Methodology \ 

1.4 Organizational Responsibilities \ 

1.5 Resource Estimate, Schedules and Milestones 

2. SET REQUIREMENTS ' \ 

1.1 Task and Deliverables \ 

1.2 Phased Milestones <. 

1.3 Applicable Standards 

1.4 Manual and Computer Procedures 

1.5 Quality Assurance Mechanisms 

1.6 Tool Functions 

1.7 Environment Considerations 

3. CURRENT ENGINEERING PROCESS (Similar Structure to (2)) 

4. PRELIMINARY SET DEFINITION 

4.1 Preliminary Tool Configuration 

4.2 Preliminary List of standards 

4.3 Preliminary Procedures 

4.4 Preliminary Quality Assurance Mechanisms 

4.5 Training Plan 

5. SET DEVELOPMENT PLAN 



5.1 Pilot Projects 

5.2 Tool Selection and Acquisition 

5.3 Standards Development 

5.4 Procedures Development 

5.5 Quality Assurance Establishment 

5.6 Training 

5.7 SET Measurement and Evaluation 



Fieure 2: Plan Outline for Developing a SET 
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Figure 3: Model for Developing a SET 
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software development environment will impact the 
organization in several ways. To cope with 
these changes organizations will need to- 

. institute a methodology for. software 
development based on life cycle; 

. establish and enforce organizational 
standards for software development; 

supply automated tools to facilitate 
the software development process; 

I 

supply training for personnel in modern 
programming practices; and 

provide the management commitment to 
develop and sustain the environment. 

The recommended approach to introducing a tools 
oriented SET into a software development 
environment is to solicit an abundance of user 
involvement; proceed to successively -more 
advanced levels of the SET in a systematic, 
coordinated manner; constantly obtain 'and 
evaluate feedback; and if necessary, make 
continual changes to the SET to ensure its 
appropriateness to a particular environment. A 
properly integrated tools oriented SET will 
yield more maintainable and error- free software 
more productive programmers, improved software 
management, and dramatically increased control 
over the software life cycle process. 

5. Phasing Into the Technology 



5.1 Tool Identification 



SET is an evolutionary concept that migrates 
from a simple technology, through several higher 
levels to an advanced (state-of-the-art) 
technology. SET technology differentiates from 
existing software development methodologies in 
that it relies heavily on the use of tools. 
Tools are used to accelerate the development 
process and to taktt advantage of extensive 
simulation and modeling techniques which 
facilitate high quality software development. 
Examples of these include- 

simulation tools; 

development tools; 

test and evaluation tools; 

operations and maintenance tools; 

performance measurement tools; and 

programming support tools. 

Early software development environments 
consisted of a compiler and a linking loader. 
Later environments included text editors and 
debuggers, informal requirements and design 
methods, and simple programming standards. Many 
new methods and software- tools have been 
formulated and built during the last decade. 



A software tool environmental review requires an 
analysis of products generated during a software 
development project and the methods and tools 
useful in generating those products. The 
software life cycle model emphasized the 
importance of both intermediate and final 
products. Each unique software life cycle model - 
has classes of products associated with 
particular life cycle phases while others 
transcend them. Requirements definition 
products serve as a means of communication with 
the customer. They are often informal and may 
simply be notes tacked up on office walls, 
compiled in a cut-and-paste mode. They may a,lso 
be expressed more formally in a language like 
Structure Analysis and Design Technique (SADT) 
[7]. Requirements specifications are formal 
. contractual documents that define the system to 
be built. They are constructed after 
requirements def inition^nd , ideally, are 
represented in a formal language on graphical 
notation such as Problem Statement Language 
(PSL) /Problem Statement Analyzer (PSA) [8]. A 
test plan based on the functional properties of 
the system described in the requirements 
specification is also an important product. The 
plan should define both test data and expected 
results as well as procedures for running the 
tests [9]. 

The design of a system is often divided into 
preliminary (or architectural) design and 
detailed design. Both phases result in design 
products that may be either informal prose 
descriptions or diagrams that are generated 
using a formal design methodology such as 
Structured Design [10]. In addition to system 
documents, the preliminary design may also, 
result in the generation of "build plans." Build 
plans describe the order in which modules or 
part 8 of the system are to be designed and 
built. The preliminary design may also be used 
to construct schedules, budgets, resource 
management procedures, milestone charts, and 
maintenance procedures [11]. 

In addition to the source and object code 
modules, the programming stage involves the 
construction or modification of management 
products (budgets; schedules, etc.), user 
manuals, discrepancy reports, and code-based 
test plans. The "programming stage also includes 
the generation of reports of all testing and 
code analysis activities. Validation products 
ensure that the software system meets system 
functional specifications and requirements, and 
also that as many as possible of the "Defects" 
in implementation are removed.. This provides 
strong indications if the system is constructed 
according to the design document and meets all 
appropriate criteria as identified in the 
requirements/ specification document. Tangible 
products will appear in the form of exception 
reports and/or warnings about anticipated 
problems. Test and evaluation tools and 
techniques can be used to support the validation 
process. 
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Systems operation and maintenance are concurrent 
activities. These are produces important to and 
either used or generated during maintenance. 
They include configuration specifications, . 
change control procedures and plans, regression 
test data and results, cross-reference 
documents, and all requirements and design 
specifications. The review stage of the j 
software life-cycle is an ongoing activity 
within each stage. It is highlighted at key 
milestone points where products from one stage 
enter into another. Therefore the products 
generated in each life-cycle stage constitute 
the review products. 

A tools oriented SET is an evolutionary concept 
that allows migration to successively higher 
levels of SET by implementing more complex tools 
and their related standards and procedures. 
This migration process moving into an ever 
increasing tool complexity environment will be 
both time consuming and difficult and at a 
.minimum will require extensive planning and 
coordination. Obtaining top level management 
commitment is a vital step in the process. 



5,2 Tool Integration 



A common argument against tool integration is 
that it imposes an inflexible development 
methodology on the development staff. One 
imagines a complex tool. To use Tool A, you 
must first use Tool B , and the output from A is 
always processed by C and then D unless there is 
input from E. The idea that an integrated tool 
environment must consist of a complex structure 
of interconnected tools is misleading. The 
fundamental feature of some of the best known 
software support systems is not their 
interconnections, but the use of common kind of 
data objects by all tools or facilities. The 
properties of the basic data* objects and the 
knowledge that different parts of tHe system 
have of objects of this kind characterize the 
degree and type of integration within the 
system. * 

A software engineering database can be used to 
build an integrated software development 
environment. The database provides an 
integrating and unifying medium for interfacing 
tools without forcing them into a complex 
structure of interrelationships. Tools obtain 
their information from the database and return 
their results to it without having to interface 
directly with other tools. 

An integrated tool system which uses a common 
database eliminates the need for multiple copies 
of the same information. The existence of 
different copies of the same information for 
different tools often creates a consistency and 
synchronization problem. Every time one version 
of a collection of information is updated or 
changed, other copies must be changed. Both the 
expense and tedium of this virtualy rules out 



the practicality of using certain collections of 
tools unless they can be modified to work off 
the/database. Tools which satisfy the 
requirements for tool compatibility 
(parameterization of input/out put .locations and 
the ability to call one, another) can be attached 
to a softwarVengineering database with 
interface for '!data translator" routines. In 
order to maintain flexibility, it is important 
to avoid building bridges between pairs of 
tools." The bridges should instead be built 
between the tools and the database [11]. 



5.3 Benefits of SET Baseline 



Entry into a SET is usually indicated by 
establishing a SET baseline. This is^ the 
minimum set of tools and technology needed for 
an initial SET. It is the starting point from 
which to"migrate to a more advanced SET 
technology. These requirements represent the 
critical mass of a SET. They also translate 
into a series of benefits which have common 
applicability to any installation. Therefore, 
an initial baseline SET would- 

. increase productivity; 

. upgrade skill capabilities; 

. automate routine aspects of software 

design; and 
. reduce the time and cost \f software 

maintenance 
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For example, software tools meeting baseline 
requirements could provide an increase in the 
quality and quantity of software code, be much 
simpler to use than writing new programs, have 
expanded software design capabilities, and be 
self documenting. These tools together with the 
standards and procedures to use them could 
constitute an initial baseline SET. 

Each organization should evaluate its own 
position in relation to a SET. Any environment 
that provides tools that supply at a minimum 
these stated benefits, can be said to have 
established an initial baseline SET. 

Once the baseline technology is in place, and 
personnel have been trained in its use, the next 
technology "release" should be planned. 
Modifying a SET requires a certain amount of 
overhead, such as the retraining of personnel in 
the usage of new or enhanced features. For this 
reason, periodic versions (or releases) of the 
SET should be planned and implemented, 
incorporating more advanced tools/modifications 
and/or enhancements to the existing standards,, 
and procedures, etc. The SET documentation 
should be appropriately updated, and personnel 
trained in the use of the new enhanced features. 
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5.4 Establishing High Pay-Off Baselines First 

The net effect of establishing a SET baseline 
technology is to stabilize an ADP organization. 
' Upon reaching a SET baseline, an organization 
will be in a much better position both 
manager i ally and technically to proceed toward 
the various higher levels of a tools oriented 
-SET. 

The use of software tools offers immediate 
benefits in terms of time and cost savings. The 
integration of software tools and advanced 
management techniques, such as structured 
walkthroughs, chief programmer teams, and 
program reviews offer high pay-off potential for 
SET. Some software development techniques which 
have consistently been found to offer high 
pay-off potential when used with a tool oriented 
SET include- 

. requirements analysis and validation; 
baselining on requirements 
specification; 

complete preliminary design; and 
process design [12]. 

Installations to maximize their payback from 
investment on new development should concentrate 
on software tools and technologies which apply 
to the requirements and design stages of 
software life-cycle management. Mature 
organizations wishing to maximize their 
\ investment return on existing software should 
Concentrate on programming and software 
validation tools. 

5.5 Training in Relation to SET 

The types and quantity of training required for 
a SET depend heavily upon an organization's 
personnel background and the make-up of the SET. 
In most installations, data processing personnel 
are not trained in the engineering disciplines. 
Therefore, a significant upgrade in skills will 
be necessary so that individuals can take full 
advantage of the new technology. 

Any training plan should utilize a balanced 
approach to provide adequate training to develop 
all skills required at a particular job level 
and category within a SET. 

The identification of tools, quality assurance, 
standards and procedures to be used at critical 
points in the plan is important. Tool and 
technique training properly integrated into the 
traiping program will ensure that an 
organization is in tune with state-of-the-art 
technology. 

A SET will progressively lead an ADP 
organization toward a/ state-of-the-art 
operation. Employees! will be exposed to a 
rapidly changing, complex environment. Everyone 



connected with the SET will participate in the : ! ; 
changes and feel its affects which, without 
proper training could become highly stressful . 
The mitigation of stress during this dynamic 
period should be a major objective of the SET • 
training . 

5.6 Maintaining an Up-To-Date Technology M ; 

To maintain a continuing level of technology 
that will keep pace with a rapidly moving 
software industry, an organization should 
consider either expanding its current, 
information procedures and facilities or 
instituting new ones to ensure its personnel and 
technology stay up-tordate . Most data 
processing organizations maintain some type of 
1 ibrary . It may be formal as in the case of a • 
central library or as informal as books on a 
table in a conference room.' The purpose of the' 
library is to provide a ready source of 
reference on information relating to data 
processing. Just as a data. dictionary is vital ; 
in the management and use of a DBMS, a current 
library is vital as a rich source of information; 
on system 1 s development and management . 

Exposure to modern concepts in tools and 
technology can help to increase individual 
productivity. Concepts and disciplines being < 
taught in contemporary academic institutions are 
quite different than those taught in prior 
years. Entirely new disciplines, such as 
Computer Aided Design (CAD) and Computer Aided 
Manufacturing (CAM), with emphasis on 
quantitative evaluation and return on investment 
require much more than knowledge of theory [13]. 
Many data processing professionals have either 
limited exposure to these concepts or no formal 
exposure. SET requires an organization to ::«• 
re- tool its environment with software tools and 
technology. In conjunction with this, SET will 
require individuals to re-tool ( train) using 
quantitative skills to properly take full 
advantage of the new tool technology. 

Success in implementing a SET is a direct result 
of a balance achieved between software life 
cycle management and the application of a 
software tools technology. The balance is 
delicate and is maintained informaly by using 
the standards and procedures review process and 
formally through periodic project meetings with 
upper level management. It is at these meetings 
where SET performance is measured against the ; ; 
project plan, future milestones identified'; and 
specific direction is provided from upper level 
management to ensure that the SET technology is 
kept on track. These top level management 
meetings are vital in the SET project management 
process. They provide performance information * 
to upper level management on SET project 
performance and provide the opportunity for 
lower levels of ADP. management to receive 
specific feedback on the SET project as well as 
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any new information which could impact or • 
enhance the SKT technology. 



6. Description of a SET to be Developed 



6.1 SET Composition 



A SET is composed of standards ".:id guidelines, 
procedures, quality assurance, tools, and 
training applied to software life-cycle stages 

in a planned balance within a carefully 

engineered environment. The technology reflects 
the process of replacing manual with automated 
functions within} a software development 
environment. The development process is 
characterized by the identification of baseline 
technologies which incorporate software tools 
and techniques with succeeding baselines ' 
equating to higher levels of software quality, 
performance and productivity. Baselines are 
custom engineered for a particular organization. 
They identify criteria to measure performance 
and specify the required tools and technology 
needed to reach a target performance plateau 
specified by and for an organization. An 
organization adopting a SET enters into a 
constant review and evaluation process to ensure 
their conformance with existing SET baseline 
requirements and to identify the tools and 
technology for higher baseline SET levels. 

The recommended SET development plan for a 
software engineering technology should include 
the following major phases: 

Existing engineering analysis, 
SET baseline planning (goals), 
SET baseline development/ upgrade . 

\ 

The exact nature and mix of software tools and 
techniques must be determined during the initial 
baseline identification process. Areas of 
software technology having the highest pay-off 
should be given prominent consideration in an 

initial baseline SET development plan._ The 

composition pf any technology is shaped by 
factors within the particular organization, such 
as its management structure for software 
development, its staff, its physical workspace 
and computer environment, and its applications. 
As a result, different organizations may have 
different software engineering technologies. 

The initial* basel ine SET should concentrate on 
the programmatic software environment. Aftjr 
thcinitial programmatic oriented SET has been 
I ished , "the technology should be expanded 
to encompass other application areas. 
1 Subsequent SET baselines should include more 
advanced tools and techniques. 



The workflow for SET development is a dynamic 
process subject to many factors having 
environmental and technical impact. A properly 
engineered and integrated SET Matrix translates 
into a level of a SET. Replacing manual 
functions in the matix with software tools and 
techniques and institutionalizing their use 
changes the complexion of the matrix and 
positively impacts the technology. Technology 
levels identify higher plateaus of software 
engineering' in the form of organizational goals. 
They identify and define target functional areas 
which can benefit from a carefully engineered 
SET plan. The plan should define the steps 
necessary to upgrade the SET by the application 
of software tools and technology. A technology 
plan is composed of a series of functional 
upgrade plans. Each particular plan does not in 
itself represent a technology level but rather 
a small portion of one. The combination of 
individual functional upgrade plans targeted to 
meet specific organizational goals at a planned 
level of productivity and performance constitute 
a baseline SET. The completion and 
institutionalization of all plans within a given 
technology level identifies that an organization 
has reached that baseline level for SET. 



6.3 Subdivision of the SET for Work Assignment 



SET implementation requires detail understanding 
of the technology and its' organizational 
considerations and impact. A. reflection of this 
understanding is the development of a plan which 
identifies software* life-cycle stages and is 
broken into distinct pfrases. 

"Establishing standards, identifying and 
acquiring the appropriate computer resources 
needed, introducing a few mature tools and 
progressively building on these capabilities is 
a good approach. Prioritizing the titeps and 
identifying and acquiring high payoff tools and 
-techniques need to be clearly mapped -out . — -This — 
should be done at the outset and reviewed along 
the way" [14]. 

Prior to entering a tools oriented SET, an 
organization should plan initially to phase into 
a SET by establishing a baseline technology. 
The advantage of this would include- 



stabilizing an organization; 
establishment of a pilot project; 
phased tool implementation; 
identification of high payoff areas; 
testing of SET on limited scale; and 
slow exposure to software tools. 



The formalization and eventual 
institutionalization of a SET is an extremely 
complex and delicate- process. Adding to the 
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complexity is the fact that the SET undergoes 
continual evaluation with identified changes 
being incorporated back into the technology. A 
good first step in establishing a f SET is to 
experiment with the tools and technology. This 
experimentation should be conducted on a small 
scale utilizing several types of software tools. 
This testing may identify difficulties and 
problems that previously may have gone 
undetected. The realization that these exist 
may cause a re-evluation of the SET Development 
Plan. The intent here is to advise slow 
progression into a SET environment. Changes in 
scope and content may be required to properly 
tailor the SET plan to the environment. Some of 
the project milestones scheduled to be included 
in the baseline SET may better be handled at 
higher baseline SET levels. This determination 
.can best be performed after an experimental 
stage where organizational SET limitations can 
be identified and factored into a strategic 
organizational SET Development Plan. 



Summary 



Computer software is expensive to develop and 
it is generally not well contolled. This is 
largely due to the lack of discipline which 
accompanys the development and maintenance 
activities associated with software. Software 
engineering is the application of a discipline 
and is no more than adaptation of the 
principles found in other engineering fields, 
to computer software so that the capabilities 
of computer equipment can be made useful to 
man (i. e., not costly and can be 
controlled). Establishing a SET within an 
organization is not a trivial effort for it 
requires involvement by all segments of the 
ADP organization and it changes the way people 
have been accustom to developing software. To 
be encompassing it must consider all stages of 
the software life cycle from requirements 
definition and analysis through operations and 

review. -Further-*- -it -must-appl-v-the-SET 

elements of standards and guidelines, 
procedures, tools, quality assurance and 
training to these life cycle stages. Also, 
there needs to be a phased plan for 
establishing SET baselines in order to 
stablize the people and the organization to 
minimize disruption of present software 
development activities. . Last, and most 
important, there must beliberal use of 
software tools to automate the disciplines 
established and to reduce the amount of labor 
associated with software development and 
maintenance . 
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Several of the team structures proposed in the liter J^re such as 
iChief Programmer Team, Surgical Team, Revised Chief Programmer 
Team advocate a separation of ..tasks for a programming team 
/resulting in specific roles for the members on the team- An 
analysis of these roles with respect to personality and task 
' requirements is presented which enables a better tailoring of 
tnese team concepts to specific projects with a fiven staff. 
Based on the definitions of the various roles of the different 
team structures requirements for a particular P 03 ^ 0 * are J er ^am 

■ rSruir^nt-ra;ch DeP pro d bie^ ^ »i 

can »\riz: d t ^siSier s:;;:r ca ^ e tak:n e ^for: f scnedu:: s 

Td/or budget .""""-n and team members become dissatis* led . 
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I. Introduction 

Almost since the beginning of programming 
the problems programmers were working on 
were big and complex enough that they had 
to organize and work in groups. In the 
beginning these groups were highly ad hoc 
and unstructured which in many cases 
resulted in a lot of entropy and low 
quality but costly software. , The 
programming team- as a more structured 
form of organization was invented as a 
cure for many a chaos oldtimers call war 
stories. Pretty soon people realized 
that just as _a ^program needs not just 
any, but a specific ITind— of structure, 
human interaction between programmers 
working on the same problem needs 
structure, too. One of the first team 
concepts which was created was that of 
the Chief Programmer Team (CPT) /BAKE72/, 
/MILL83/. Many ' of the high expectations 
this concept generated were soon 
shattered, however, because the problems 
with software development did .-'riot g-o 



away . Other team concepts were born, 
inost notably among them the Surgical. Team 
(St) /BR0075/, the Revised Chief 
Programmer Team (RCPT) /MCCL81 / , and 
Egoless Programming (EP). /VEIN71/. The 
problems, i.e. cost and schedule 
overruns, unreliable software, high 
turnover, still persisted, in spite of 
"these newT oy^niTari^nal. rorni's^^n^ 1 - ±n~ 
spite of structured design, structured 
programming, ' structured ' testing, 
structured Walkthroughs - one might be 
tempted to sa^y structured anything. 

People star tedVnvestigating the nature 
of these structures, for instance how 
team structure, \nost notably reporting 
structure, influences program structure 
and they realized that indeed it does to 
a great extent. 0\i the other hand we 
know that most problems have an 
indigenous structure, ^e.g. passes of a 
compiler or the hierarchical structure of 
many application programs. The question 
then becomes whether the team structure 
and the structure of the* problem are 
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compatible. If they are not, 
dif f icul ties arc bound to arise , because 
followinc the team structure then goes 
against the grain of the the problem 
structure and vice versa. 

It is important to realize that the need 
for structure in the product stems from 
the desire to reduce the complexity of 
the problem and its solution by 
partitioning it into logically 
self-contained partis having the simplest 
interfaces possible. Likewise the 
motivation underlying the definition of 
programming team concepts arises from the 
need for dividing tasks into 
self-contained parts which can then be 
assigned to members of the team and 
carried out by them with the least need 
for. further clarification and even 
communication with other team members. 
Another motivation was specialization. 
This is rooted in the belief that a 
' higher" degree of specialization is more 
economical, because most people are not 
equally talented or skilled in all tasks 
during the software development process, 
and if we can assign the most highly 
skilled person to a task we get better 
quality and higher efficiency. This is a 
concept which sometimes has been employed 
to advantage in the software development 
industry as it has in other industries 
since Smith in 1776 first advocated the 
concept of division of labor /SMIT76/ 
which was later taken up and further 
developed by C. Babbage in 1832 who 
emphasized decreased learning time and 
increased skill due to repetition as some 
of the advantages., of specialization 
/BABB32/. Later "scientific management" 
/TAYL1 1/ went even further in 
specialization leading to the attitude 
underlying assembly line work*. In 
programming we also have the possibility 
of considerable specialization and 
standardization concepts which, taken to 
the extreme, can lead to assembly line 
programming with all the pros and cons 
..--a«seol>-Ly7Ji-ne...w-or.k^±8-Jcni)K0.1tP«Aaye.. 

Through research in ergonomics and 
psychology we have learnt that different 
degrees of specialization require 
different personality characteristics . 
For example,, if a very strictly specified 
and standardised task is assigned to an 
individual with high creativity needs and 
needs for high growth and responsibility, 
this person will very likely feel bored 
and dissatisfied because these needs are 
not being met. Considerations like these 
lead to another set of factors which have 
tc be considered : personality traits 
which either facilitate or impede a set 
of task assignments within a chosen team 
structure. Lastly the team has to fit 



into the overall organizational 
structure, otherwise too much external 
friction or too little involvement with 
the rest of the organization may result. 

The remainder of this paper will review 
the most commonly advocated team 
structures, i.e. CPT , RCPT, ST, and EP, 
and analyze them with respect to 
personality requirements and degree of 
specialization involved. It will 

investigate the types of projects which 
are most suited for which team concept 
and present a set of guidelines how to 
pick the right environment for the right 
task • 

II. Review of Concepts in Task Analysis 
and Design 

As pointed out earlier, the- advent of 
programming team structures was motivated 
by the need for specialization. One 
improvement hoped for was a decrease in 
the overhead effort due to the need for 
communication between all people who have 
to interface with each other. A second 
was that we would no longer need the 
"renaissance-man ( or .woman) of 

programming" who had to be able to do all 
jobs involved - in software development 
equally well. Since people often haye 
very specific specialize^ talents, this 
notion had been rather unrealistic to 
begin with. Programming is a complex 
task involving a multiplicity or 
functions and at times as we know from 
software for space flight to- software for 
automatic shutdowns of nuclear reactors 
may be rather critical with a high need 
for reliability. 

Before selecting a ' specific team 
structure for a software development 

effort we have to investigate- the 
following dimensions for an adequate fit: 

1. Degree of formalization to enable 
proper management control for quality, 
schedule, cost, etc. according to goals 

„ and. priorities. 

2. Interaction with environment. A 
development team develops software for a 
user and thus has to relate to the user 
or its representative. Furthermore the 
team is also . part of the organization 
within~"which it is placed. 

3. Product structure. This refers to 
the final software product as well as the 
products of intermediate stages 
(requirements, specifications, design, 
etc. ) 

^. Heeds of the individual and 
requirements for the individual. Here we 
include the degree of clarity of task 
assignment, the^ amount of communication 
needed as well as an assessment of 
adequate task attributes for the 



154 



individual's motivational needs. We need 
to look at what ia called task analysis 
and design. One of the theories which is 
helpful in this context is the Job 
Characteristics Theory by Hackiaan and 
Oldham /HACK 7 6/ , /HACK80/ . This theory 
states that there are 5 different core 
job dimensions which influence three 
major psy etiological states, the primary 
determinants of motivation. The 5 core 
job dimensions are: 

* skill variety - how many different 
skills does a job require? 

* ta3k identity - to which degree is a 
job done from beginning to end with a 
visible outcome 

* task significance - what impact does 
the job have on the environment 

* autonomy - freedom and independence to 
do the work (schedule and procedures) 

« feedback - how much information about 
quality of work performance is given 
during the work 

The 3 major psychological states are: 

* Experienced meaningf ulness of the work 

* Experienced responsibility for work 
outcomes 

* Knowledge of results 



Obviouslv not everybody needs the same 
amount of task significance, autonomy, 
feedback, task identity or skill variety. 
How much an employee needs in these core 
areas in order to experience that he/she 
is doing a meaningful job providing 
enough responsibility and knowing enough 
about the results to be satisfied depends 
on that person's growth , need. Some 
individuals require high scope tasks 
whith very high levels in all job 
dimensions mentioned, others become 
overly stressed when a task requires too 
many skills, is too loosely defined (too 
much autonomy) or does not provide 
instant feedback at the end. Beyond job 
characteristics we consequently have to 
look at the skill level of a person (the 
more skills the less stressful . a 
situation, but an overly skilled person 
may get bored) but also at the three 
basic work motives: 

* task - performing it and becoming more 
skillful 

* relationships - popularity, interaction 

* influence and direction over people 
Motivation can be positive or negative 
and vary in intensity. We have mentioned 
task oriented motives, external stimuli. 
Internal motivatgrs are personality 
traits. It (^fas been said that DP 
professionals have a very high motivating 
potential due to their high, growth need 
(/COUG78/) which is connected to the work 
motives task and influence and direction, 
but that their social needs 
(relationships) are low. This, is very 
similar to the motivational profile for 



engineers ( / STEV77/ ) • This usually also 
means that their skills in this area are 
not as well deVeloped arid/or that they 
experience a higher stress level when 
faced with such tasks than people with 
higher relationship* needs. Stevens and 
Kr ochraal represent that these personality 
traits result in the following «turn-ons n 
and "turn-offs" for the individual 
(/STEV77/i p. 168): 

Turn-ons: 

1. Moving forward on project when he/she 
feels it is appropriate, having and 
maintaining control. 

2. Being able to measure own progress. 

3. Having to ke3 > touch only with 
project progress (that affects him/ her ) . 

H. Brief, to the point, pragmatic 
comnun icatioff 

5. Practical work 

6. Personal hoals, in specific project 
goal s • 

Turn-offs : 

I. Waiting for politics, etc. or things 
he/she cannot control. 

2. Not knowing how he/she is doing or 
how work is progressing 

3. Having to keep up with things that 
don't directly concern him/her, e.g. 
administrative meetings. 

h . Policy statements, personnel forms, 
r egul ations. 

5. Having to remember feelings, 
birthdays and social events. 

6. Group concerns and organizational 
goals . 

Although some of these traits shed light 
.on current problems in software project 
management, notably staffing and. control 
(/THAY82/), and explain the preoccupation 
with tools and development methodologies 
v er sus proper 

management procedures 
(/ZOLH82/), this list of motivators and 
demotivators should be regarded as a. 
checklist for traits to be considered and 
to evaluate rather than assuming that 
they are always present to a high degree. 
After all individuals do differ. If we 
want to select the proper team structure 
for a software devclopme.it effort we have 
to look at these factors to define tasks 
with the proper attributes for the 
members involved so that high internal 
work motivation,, high quality work 
performance, high work satisfaction and 
low absenteeism and turnover is 
facilitated as much as possible. With 
these thoughts in mind let us now turn to 
a review of the individual team 
structures. 

III. The Chief Programmer Team (CPT) 

The CPT provides a high degree of 
formalization within a strict 

organizational structure, clear 
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leadership through the Chief and the 
possibility for specialization through 
functional ---separation-. The reporting 
structure within the team is explicitly 
defined, as are the relationships between 
team members. Some functions are 
explicitly defined. The rigid structure 
facilitates management control, 

visibility of product and personel, 
communication and product structure. It 
also tries to guarantee continuity by 
ensuring that at least two team members, 
the Chief and the Backup programmer are 
familiar with every aspect of the 
project. The nucleus of the CPT involves 
the following individuals: 

1. Two technical experts, the Chief 
Programmer and the Backup Programmer. 
While the Chief Programmer is the 
undisputed technical leader who is 
responsible for the team's success, who 
develops all documents of the early 
phases (requirements, specifications, 
design), who codes and tests the critical 
parts of the system and (closely) guides 
and supervises the other team members, 
the Backup Programmer although he does 
not have the decision making power of the 
Chief Programer, acts as a backup leader 
and peer to him/her. As such he has to 
be totally familiar with the project in 
order to be able to take over leadership 
when necessary and to participate in all 
important technical decisions. He is 
responsible for the test plan and also 
usually does research work for the Chief 
Programmer. Obviously these two 
functions require considerable expertise 
in software development, the Chief 
Programmer also has -to have sound 
management experience. The other 
functions do not require technical and 
managerial expertise at this high a 
degree. 

2. The clerical assistant or programming 
secretary makes sure that the documents 
are current and visible and maintains 
libraries, test data, test results and 
project documentation. 

3. The programmers are junior personel 
who implement the code according to the 
Chief Programmer's directives. 
Optionally there may be a project 
administrator who reports to the Chief 
Programmer and takes over some of the 
administrative tasks. 

When we look at the four team dimensions, 
we can see that this team concept rates 
very high in terms of dimension 1, degree 
of formalization. Provided that the 
Chief Programmer does his/her job 
correctly there is adequate possibility 
for management control for quality, 
schedule, cost according to priorities 
and goals the Chief Programmer sets for 
the team. No conflicts arise due to 
ambiguous reporting structure, or 
conflicting goals set by several people. 



The team is relying for interaction with 
its environment wholly on the chief 

programmer. His responsibility is to 

talk to the users as well as to represent 
the team to the rest of the organization. 
Since the team structure is hierarchical 
the product structure tends to be as 
well. Intermediate deliverables such as 
requirements, specifications tend to be 
uniform reflecting one basic philosophy, 
since they are the work of one individual 
(in collaboration with the Backup 
leader). Often they show hierarchical 
structure, because they already reflect 
the division of labor for the Junior 
Programmers. This role definition 
reflects the need for few, gifted 
individuals during the early phases of 
software development ( /MYER7 6/.) • When we' 
look at the requirements f or ? the 
individual team members there are several, 
classes: Highly qualified (both 
technically and manager! ally) individual s 
for the positions of Chief Programmer and 
Backup Programmer. They have to possess 
good people skills and be adequate 
communicators in order to relate well 
enough to the user (representative) to 
understand what the user wants. Second 
they have to be managers who can plan and 
control, set goals and priorities and 
assign tasks, evaluate progress and 
report on progress to the higher 
management level(s). Third they have to 
be technical experts in the field of 
application. They have to be able to 
decide which aspects need to be 
investigated for a feasibility study, 
what the software's functions will be, 
what the human/machine interface will 
look like, as well as being able to make 
the major design decisions. The Junior 
Programmers on the other hand do not have 
to be quite . so universally- gifted. 
Depending on the level of detail of the 
design specification they actually may 
have a very structured, specialized 
coding task to do with little freedom or 
room for creativity. Once the work is 
assigned and specified there is little 

need for communication and due ...to.. ..the. — 

very technical level at which the Junior 
Programmers work they do not have to have 
very outstanding communication skills nor 
the ski lis involved in requirements 
analysis or specification writing. 
Management skills are obviously not 
needed by them . The programming 

secretary has to possess skills in Word " 
Processing, Technical Writing, some 
programming experience as well as the 
ability to communicate with team members 
about such issues as change control and 
document preparation. 

From the explanations so far, it is quite 
obvious that two of the people in this 
group have to possess rather high level 
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skills in technical and management areas, 
the Chief programmer and the Backup 
programmer. The other roles are more 
specialized. The Junior Programmers are 
only involved in the implementation 
phases. The programming secretary 
fulfills the role of a communicator and 
standards bearer, again aa specialized 
function. The admi ns t r a ti ve assistant 
is, although involved throughout the life 
cycle, assigned administrative tasks 
only. If we try to rate the'se roles in 
terms of their core job dimensions, then 
the CP has the highest ratings in all 5 
dimensions. This can be very positive 
for an individual with high needs in all 
these areas, but can lead to 
over-activation through, stress and to 
role-overload with the resulting decrease 
in job satisfaction. The CP also has to 
be fairly well balanced in his skills and 
work motivation in all three areas: 
task, relationships, and 

influence/direction. If we have an 
overemphasis on one or a deficiency in 
another there may be problems. This is 
clearly pointed out in a critique of the 
concept (/MCCL81/) which* mentions the 
dangers of (a) expecting too much from 
the CP, the "Superprogr ammer " , or (b) 
having a powerhungry primadonna at the 
helm (imbalance in the motivational 
area). Case (a) can be dealt with 
through task redesign by delegating some 
of the responsibilities, the most obvious 
being along major skill boundaries which 
span the entire software lifecycle: 
managerial, administrative, and 

technical. Another possibility is to do 
this according to phases: requirements 
and specifications versus design and 
implementation. Since a considerable 
amount of communication is involved in 
either solution it is not a very good 
idea to separate the functions strictly 
along one or the other of these 
dimensions. It leads to too much 
communication effort between the 
different "commands" as responsibilities 
are shifted, unless very strict standards 
'are imposed which depending on the 
personalities involved may not give them 
enough of a sense of autonomy to keep 
them satisfied. 

The second danger McClure points out is 
the lack of checks for the CP's ego. 
This can basically be dealt with through 
a similar approach: built-in delegation 
of authority to make decisions, either 
rotating decision making power through 
different phases or splitting it 
according to task areas: administrative, 
managerial, technical. 

The next question is to whom are these 
responsibilities delegated? An obvious 
first solution is to the Backup 



Programmer. He has to have all the 
qualifications of the CP, but so far has 
no guaranteed authority unless 
voluntarily delegated from the CP, Thus 
the team concept creates a situation 
which can create high levels of negative 
motivation due to a low ranking in" the 
job dimension autonomy. This combined 
with high motivation levels in the area 
influence/direction can lead to a po.tent 
and lethal situation: why am I working 
so hard when I do not make a difference 
anyway? I know as much as the CP, but 
nobody does what I think should be done. 
V/hy bother? In other words the result of 
this structure can be a marked lack of 
enthusiasm and resulting lower 

perf ormance. It should be noted that 
this need not occur, if the relationship 
skills (and motivation) of the CP are 
well developed or if the autonomy needs 
of the BP are not very high, but not 
"everybody has the talents and personality 
traits to be a good "vice president" 
whose role is to know it all, but to stay 
in the background until something 
happens. 

The Junior Programmers may also lack 
enthusiasm when their work is 
"over specif ied" for their skills and 
their needs. A JP may be very happy and 
content with a precise low level design 
specification for his/her work at first, 
but resent the lack of involvement in 
design decisions, the lack of 
meaningf ulness in his work because he is 
not given an adequate picture of the 
whole product, and the lack of 
communication for his relationship needs 
at some later time. Again, this may but 
need not happen, especially if the CP 
realizes the growth needs of his staff 
and delegates this part of the work when 
a JP is ready for it. Unfortunately the 
CPT does not give guidelines for this. 
It doe3 provide a good learning 
environment for JPs when the CP and the 
BP are actively pursuing making it a good 
one. Again, there are no guidelines, . 
They have to include sample task 
assignments for all levels of J Ps-^ . since- 
obviously a transition from JP to BP or 
even CP is only going to be successful, 
if the JP has been trained in all the 
areas the CP has to cover. This not only 
includes the low communication technical 
tasks, but also the high communication 
technical tasks such as developing . user 
requirements and specifications. It 
spans all the technical, administrative 
and managerial aspects of the entire 
software development lifecycle. Unless a 
JP is trained in all of these, he may not 
be able to develop his/her . talents as he 
needs to s (remember, they are known to 
have high growth needs), nor will he/she 
be able to prepare adequately for 
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becoming a BP or a CP. Suggested 
additional Guidelines for training are:' 
* evaluate growth need and motivations 
(what do they like to do) 

* evaluate major skill areas (what are 
they capable of or show promise in) 

* involve JP to skill level in all of 
these areas (let JP do some of the work 
he wants to and can do) 

* reevaluate additional needs and 
motivations (take stock and give feedback 
to JP) 

* add responsibilities in these areas to 
help the JP grow. 

Obviously this needs time and this time 
has to be built into the schedule. It is 
fallacious to think that learning takes 
no time on the job. It does, for the 
teacher as well as for the student. The 
idea of creating a good learning 
environment had been one of the goals of 
the CPT, but the definition of roles was 
too static and did not explain adequately 
the progress of the JP and how it should 
be dealt with within the team structure. 
Additional responsibilities should be 
added in all areas of the tasks involved, 
on the administrative, managerial and the 
technical level with the objective to 
expose the JP over time to all the 
aspects of work of the CP which now even 
includes teaching explicitly (another 
responsibility which can be delegated in 
decrees). It is* also suggested that the 
JPs are familiarized with the skills of 
the programming secretary and the project 
administrator, if they are motivated to 
learn these skills. 

IV. The Surgical Team (ST) 

This team structure is very similar to. 
the CPT. Like the CPT it is based on the 
concept of specialization, but unlike the 
CPT which enables specialization to the 
extent of assembly line programming for 
the JPs with the resulting low level of 
task identity,. task significance, 
autonomy and possibly feedback, ..the 
Surgical Team (ST) specializes such that 
task identity and task significance still 
rate high. This is achieved through 
defining roles for the following areas of 
specialization: technical, 
administrative, editing and clerical. In 
particular the roles of the team members 
are defined as follows: 
1 • Surge on 

He is the technical manager much the same 
as the CP. One significant difference in 
the ST is that a lot more administrative 
tasks are delegated to the administrator 
(who reports to the Surgeon). This 
narrows the skill variety dimension 
somewhat, which had been a cause for 
possible role overload in the CPT. 



2 . Copilot 

The Copilot's responsibilities in the 
surgical team are the same as the backup 
programmer's in the CPT. In addition, 
the Copilot is responsible for 
interfacing with other teams. This 
additional function enhances the"'Tt)I"e^ i "*oT^ 
the copilot, takes some of the workload 
off the Surgeon and gives the copilot 
some visibility which serves to increase 
the sense of. influence and importance 
which this role lacks otherwise. 
3- Administrator 

Tho administrator i3 responsible for 
personel, budget and procurement which 
includes space, computer time,, technical 
tool3, and also interfaces with 
management. This role requires limited 
technical expertise, but a good deal of 
administrative knowledge and qualities, 
negotiating and planning skills. 
4. Editor' 

The responsibilities of this function are 
to generate all project documentation. 
This requires good technical writing 
skills and communication skills with the 
other technical members of the team. 
5* Secretaries 

One or two may be necessary to support 
the tasks of the administrator and the 
editor. They are clerical support 
personnel which need typing and 
communicative skills. 

6. Programming clerk 

This role is the same as the programming 
secretary of the CPT with the same 
responsibilities except for hose which 
now have been assigned to the editor. 

7 . Toolsmith 

This is a new role specializing in 
providing and keeping operational all 
necessary technical tools such as 
utilities, libraries, debuggers, etc. 
Part of this function was unspecified in 
the previous structure and probably 
assigned to one or the other of the JPs 
on an ad hoc , basis, part of it was the 
responsibility of the programming 
secretary. This role requires the skills 
of a good systems programmer as well as 
some communication skills, because even 
if the- editor is responsible for 
generating the documentation, ..the 
toolsmith has to communicate tc him/her 
what needs to go into them. 

8. Tester 

The function of the tester includes the 
implementation of the test plan (which is 
provided by the Surgeon), creating test 
data, test drivers, debug procedures and 
the like. It should al30 include 
evaluation of tests' and feedback to. the 
group how well they are doing in their 
implementation efforts. May be included 
could be data collection for a software 
metrics database which can be useci for 
empirically founded co3t and schedule 
estimates as well as quality predictions. 
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For nore detail see /DEMA83/ • The 
function of the tester is one which does 
not require the constructive ability of 
implementing a coded solution, rather 
he/she needs a "destructive" talent to be 
able to and to enjoy finding errors and 
knowing how to go about finding them, and 
as many of them as possible. A patient, 
perceptive, analytic detail oriented mind 
is needed for this job. It is not enough 
to find out that there is something 
wrong, but also where and what it is. 
These talents are not the same as those 
of a good designer or of a reliable 
coder. 

9. Language Lawyer 

This individual is the expert on 
programming languages and knows the most 
efficient ways to implement the design 
specifications as well structured code. 
This definition specifies the function of 
coding. Again this is a very strictly 
defined function with a high degree of 
special.izati on. 



The motives underlying this team 
structure are to provide a formal 
structure which enables mangenent control 
for quality, schedule, cost according to 
goals and priorities. This goal is 
obviously met very nicely. As the CP in 
the CPT, the Surgeon in- the ST has the 
power to exercise as much influence and 
direction as he/she sees. fit. The role 
definitions are even more explicit, and 
narrow for some, than in the previous 
team concept. Interaction with the 
environment is emphasized through the 
role of the administrator (to 
management), the co-pilot (to other 
teams), the editor (roject documentation) 
and the Surgeon who is still responsible 
for user interaction. Since the 
administrator reports to the Surgeon, 
interfacing with management may have its 
problems, because the administrator 
reporting to the Surgeon may not have 
adequate negotiating power. Product 
structure will again reflect the 
cooperation between Surgeon and Copilot 
in the early phases and result in" the 
hierarchical structure we have seen in 
the CPT for the very same reasons. We 
can expect uniform project and product 
documents, since they are written by the 
same person. When we look at the needs 
and requirements of the individual, we 
can clearly see a reduction in skill 
variety for all functions involved . This 
is one of the declared goals of the ST. 
It has also been pointed out that this 
can pose motivational difficulties, if 
one of the specialized tasks is assigned 
to a person who needs a higher degree of 
skill variety. Task identity may also be 
a problem, because the specialized skills 
of the different functions may not • be 
needed during the entire life cycle and 



this subtracts from the sense of being 
involved in a job (i.e. developing a 
particular piece of software) from 
beginning to end. Task significance is 
very high due to the idea to give team 
members functions which indicate that 
they are experts in their own right: 
High task significance is supposed to 
increase team morale and improve 
individual recognition. With the 

specialization the way it is proposed 
here comes a considerable degree of 
autonomy in the area of specialization as 
far as procedures go, but not necessarily 
with respect to schedule. The degree of 
communication and cooperation these 
specialized roles require prevents this. 
V!e also have functions defined which may 
not be perceived as having enough 
autonomy which actually may be \due to 
overspecialization (not enough skill 
variety ) . . For example the language 
lawyer may feel like a coding machine and 
the toolsmith may not perceive enough 
connection to the goals of the rest of 
the team. Feedback is generally good \ in 
this structure due to the need to use 
each other's work. The major potential 
difficulties in this structure arise/due 
to limited skill variety which may/ be 
experienced by the* team member as work 
which is not meaningful enough. ' The 
roles of the team members stress 
complementary skills and thus are 
enhanced by complementary work motives. 
The more technical oriented roles require 
task, motivation and depending on the 
degree of . interaction with the 
environment relationship and/or 

influence/direction motivation (in the 
case of the Surgeon). .The Surgeon and 
the Copilot are sharing some of the work 
more equally now, but the Surgeon may 
still experience role overload and over 
activation due to the variety of skills 
required from him/her. And the degree of 
autonomy combined with a high 
motivational level in the 

influence/direction area may pose 
problems for the Copilot. 

The goal of a good learning environment 
is remarkably absent from this team 
definition. And upon investigating the 
potential for growth and development of 
the team members it is clear from the 
definition what that means: the team 
members are recognized as experts in 
their respective area and that is it. 
They are supposedly experts. In other 
words, the concept of' a. junior member 
does not come up. On the other hand Dp 
professionals have high growth needs 
(/COUG7 8/ , VFITZ7 8/.) , they tend to want 
to learn new things and tend to get 
bored, if they have to do the same tasks 
over and over. One solution is to train 
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then outside for new and now more complex 
tasks IM THEIR AREA OF SPECIALIZATION or, 

■ if -the function requires more than one 
person,- to- add a trainee who during , 
development can learn under the 
supervision of the expert. Another issue 
then is the . question how to train a 
Copilot or a- Surceon. Since all the 
oth-er roles are so specialized, they do 
not address the training in the 
multiplicity of skills required from the 
Surceon. The only remedy here is to 
rotate the prospective Copilot and or 
Surgeon through the different functions 
on the team. For some he may only have 
to serve as an apprentice, for others he 
may have to acquire skills high enough 
for the expert position. This ensures 
that there is a promotional path within a 
function as well as beyond functions. 
Adequate promotional paths are considered 

/ prime motivators for continued job 
satisfaction as they provide a pattern 
for fulfilling growth needs. The concept 
of apprentices which was added here also 
serves to increase the sense of autonomy, 
of being in control for those roles which 
may be perceived as having very little 
otherwise (e.g. Copilot, . Language 

Lawyer ) • 

This team concept should work well, when 
' there are two well rounded and highly 
qualified individuals available who can 
adequately fulfill the roles of Surgeon 
and Copilot whereas the rest of the team 
members have specialized interests and 
task motivations in the areas defined by 
the functions above. They have to 
possess a • degree of sophistication 
commensurate with the complexity of the 
project. People with a need for high 
skill variety are not expected to 
function as well in this type of 
structure in roles other than Surgeon or 
Copilot • 

V. Revised Chief Programmer Team (RCPT) 

This team concept is a further 
development of ... the . CPT ... by McClure 
(/MCCL81/) whicl? tries to remedy the 
foil owing perceived shortcomings of the 
CPT: 

* role overload of CP 

* inadequate level of autonomy of BP 

* environment not open and sharing enough 

* project too dependent on individual 
team members (i.e. CP ) 

* not enough visibility for team to the 
outside 

* inadequate degree of formalization of 
individual's responsibilities 

* CP has too much power 

She tries to overcome these shortcomings 
mainly by redesigning the tasks and 
responsibilities of the CP and the BP, 
reducing the areas of responsibilities of 



the CP by creating two new positions, 
that of the user liaison and that of the 

administrator who ... ta.kes oyer all the .... 

administrative tasks of the CP with the 
administrative power (the CP now reports 
to the adminstrator). The user liaison 
also takes over some of the 
responsibilities of the CP including all 
direct dealings with the user. The 
coleader's role is enhanced through 
additional areas of . prime 

responsibilities notably representation 
of the team to the outside and 
coordination of project turnover with the 
maintenance group. He also now has the 
responsibility of developing the test 
plan. The CP only reviews it. 

If we compare this structure to the 
previous two, then we. can clearly see 
that some of the problems of the CPT and 
the ST have been overcome. Work and 
recognition is raore evenly spread. At 
the same time there is enough structure 
in this team to enable adequate 
management control. As a matter of fact 
the tasks are more precisely defined in • 
this team concept for the roles of 
Surgeon/CP and Copilot/BP than in the 
other concepts. Interaction with the 
environment is a lot more emphasized than 
before by creating two new positions, the 
administrator (interface to management) 
and the user liaison (interface to the 
user) and explicitly mentioning the need 
for communication to other teams. 
Tending to those needs is the express 
responsibility of the Coleader. 
Maintenance preparation is another issue 
which is not expliqitly addressed in the 
. other team structures, but here it is the 
Coleader's job. The team, since it is 
not as strictly hierarchically structured 
shouldadapt to a wide variety of posible 
product structures. Through the primary 
involvement of mainly three people, the 
leader, coleader and the user liaison, 
one can still expect a fairly unified 
•product developed in a consistent design 
philoso phy . The function of user liaison 
ensures that the product will actually 
meet the user's needs. Looking at the 
requirements and needs of the individual, 
we see that the roles of the four nucleus 
positions no longer require the same 
■ skill variety as in the CPT or the STi 
As a matter of fact the four positions of 
administrator, project leader, coleader 
and user liaison are created by following 
the concept of division of labor and 
specialization. All technical tasks are 
the resonsibility of the project leader 
and coleader, the administrator only 
handles managerial and administrative 
functions and the user liaison 
specializes in user/devel oper 

communication. For all these positions 
this means a reduction .of skill variety 
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compared to the CPT. The ST already had 
made a step into this direction, but 
mostly for the roles other than Surgeon 
and Copilot. Thus the ST and the RCPT 
differ markedly in the direction of 
specialization: The RCPT reduces skill 
variety for the positions most critical 
to the first phases of the software 
development lifecycle,- whereas the ST 
specializes the programmer positions, but 
leaves the others with a much higher 
degree of skill variety. The RCPT 
mentions similar functions as the ST 
does, but it does not mandate whether 
they should be done by a "specialist" or 
by several people together with a 
resulting higher skill variety for 
everybody. This obviously increases this 
concept's potential for skill variety as 
well as task identity (which is true for 
the four nucleus positions anyway). Task 
significance is higher in this concept 
than in the previous ones, provided that 
the programmers are not assigned to the 
specialist roles of the ST. The RCPT 
rives the project leader considerably 
less autonomy in the administrative 
aspects of the work, but still leaves 
freedom for schedules and procedures. 
Feedback between the members of the 
nucleus is expected to be fairly high, 
since they work on a preliminary product 
together (requirements, design), before 
the final product is ready for release. 
The amount of feedback for the 
programmers is uncertain and ..depends on 
the actual work assignments and control 
mechanisms chosen which are left 
unspecified in this concept. As a result 
this concept has a good potential to 
structure work such that it, is 
experienced as meaningful, that the team 
members feel they are responsible for 
work outcomes and possibly that they know 
about the results of their effort. To 
ensure this, it is suggested that someone 
from the development team is involved in 
product maintenance once the software has 
been delivered. Why that is supposed to 
give feedback to ALL members of the 
development team is unclear,, however. 

Due to the degree of specialization of 
the members of the nucleus we need people 
with different areas of skills and 
motivation, managerial skills and 
influence/direction motivation as well as 
relationship motivation for the 
administrator* Primarily task motivation 
and some relationship motivation is 
needed for the leader and the coleader, 
relationship and task motivation for the 
user liaison, and task motivation for the 
programmers. The skills required are 
managerial and administrative for the 
administrator, technical and 

communication 3kills for the project 
leader, and the coleader, and a great 



deai. of communication and negotiation 
skills and some technical knowledge for 
the user liaison,- technical skills for 
the programmers. If we have people with 
these qualifications we can use the RCPT 
to advantage. 

Once again, this team concept does not 
mention training and professional growth 
as part of the concept. It is possible 
however, to use the suggestions made for 
the previous two team structures. Of all 
the team structures reviewed this seems 
to be the most balanced one (consequently 
suited for the most balanced set of 
peopl e ) . 

VI. Egoless Programming Team (EPT) 

This is the last team concept to be 
reviewed. It works on the basis of free 
cooperation with no specific roles or 
reporting structure within the team. 
Everybody is responsible for everything. . 
The team works towards a common team goal ... 
in a totally democratic work environment. 
If there is a leader or if there are 
assigned rules, they have been agreed 
upon by - the majority of the team members 
and are only assigned "until further 
notice" when a subsequent vote changes 
the assignments. Thus team leadership 
may rotate. So can function. The idea 
behind this concept is to give full 
autonomy, to the members of the team. In 
other words there is no formalization of 
team structure which enables management 
control. It is thought to reinforce team 
spirit similar to the Volvo experiments 
(/GYLL77/, /F0Y76/). This nonhostile 
environment is hoped to be an excellent 
learning environment because everybody is 
involved in everything. One other reason 
for advocating this team structure is, 
that it is dangerous and unproductive to 
allow programmers to "sit on their code", 
because it causes them to regard it as 
extensions of their legos, resulting in ^ 
tunnel vision and morje undiscovered bugs. 
Also, since everybody is involved 
equally, the result should be a better 
integrated system. Code exchange is 
mandated as an important part of the 
development with the hope of having a 
more visible, better readable and more 
reliable system. As mentioned before, 
this concept does not provide adequate 
management control. This concept, also 
called "autonomous work groups", may have 
worked in auto production where tasks are 
well defined and repetitive, but the 
developing of software does not have- - 
these properties, thus increasing the 
complexity of the work significantly. 
Software developments also tend to take 
much longer than putting a car together. 



The democratic approach often proves to 
be much too loose for management control^ 
not only because the performance of the 
individual cannot be evaluated easily, 
but also because, since nobody has the 
decision making pov/er to settle disputes 
(over design decisions for example), the 
whole team can turn into a debating club 
where no work gets done. Decision making 
may be postponed indefinitely. During a 
crisis when leadership is needed,- it is 
often hard to find somebody who is 
willing to take it over. 

Visibility of the development team to the 
outside is another problem. There is no 
one person to whom the user should talk 
to, there is no one person who is in 
charge of communicating with other teams 
or with management. This can be very 
confusing and frustrating, for the people 
involved. Product structure again tends 
to parallel the (informal) team structure 
which now depends on how team members 
relate to each other. The earlier 
deliverables such as requirements and 
design specifications probably will not 
be quite as uniform, since a lot more t 
people will be involved and a lot more 
different opinions need to be reconciled 
and integrated. 

The individuals in a team structure like 
this need to be able to deal with high 
skill variety or be able to negotiate a . 
task commensurate with their inclinations 
and talents. Otherwise they may end up 
confused, overstressed and as a. result 
experience motivational problems. Since 
everybody works very closely together and 
is involved in every aspect of the 
software development process, task 
identity and task significance are high. 
Autonomy also is rather high to the point 
that team members may become disoriented, 
confused, or unmotivated, because "nobody 
is telling them what to do". Feedback is 
built into the system through working so 
closely with others. As a result this 
team concept lends itself to have its 
members experience the meani ngf ul ne ss of 
their work and knowledge of the results. 
It tends to have problems in the area of 
perceived responsibility for work 
outcomes of the individual.. People with 
strong task motivation may be very happy 
in this environment, but they may also 
experience a lot of frustration when they 
feel that they know best, but others 
don't agree and they lack the 
relationship motivation and the 
negotiating skills to deal with the 
situation. All members have to be 
relationship motivated to a degree. If 
there are too many people who have 
significant influence/ direct ion 

... motivation, they may all strive for being 
the group appointed leader and severe 



conflicts may result. This concept seems 
to work best when the group i3 small, 
because there are less people to 
communicate and negotiate with, when the 
members acknowledge each other as equals, 
and when they have enough expertise and 
are goal oriented enough to be able to 
set and achieve their own objectives. 

The claim that this team concept provides 
an excellent learning environment is' only 
partially justified. A "new kid on the 
block" may in the beginning pose more of 
a problem than be an asset. Often groups 
"isolate" unproductive members by giving 
them tasks which do not have a lot of 
impact on the group's success, thus 
pushing the member to the periphery . 
This avoids the extra teaching and 
communication effort and reduces the risk 
for the rest of the team. It does not 
motivate the junior person a whole lot 
though (low task significance). A better 
idea is to have a mentor for the trainee 
or convince the group that besides 
developing software they ALSO have the 
goal of educating the junior f member and 
making this task just as much a team 
objective as software development itself. 
To achieve this there must be a clear 
incentive (reward) for the team, e.g« a 
mentor reward (competition) and/or a 
monetary incentive for the group, based 
on the relative learning achievement of 
the trainee. Teaching may be done by 
rotating mentorship, so that everybody 
gets a chance to be the teacher. 

In conclusion, the egoless programming 
team has some advantages over the other 
teams in terms of autonomy for the 
members, but due to its poor management 
controls and high need for interaction 
and communication, it should only be used 
for very small teams where the people 
involved relate well to each other. This 
obviously limits the 3ise of the. project 
which can be done considerably. It is a 
disastrous idea to use this concept with 
people who need close supervision or are 
not able to set realistic goals for 
themselves. 

VII . Conclusion 

This paper attempted to show how the most 
commonly advocated team structures can be 
used, what their characteristics are, 
where their advantages and disadvantages 
lie and how to select a proper team 
structure based on work and people 
characteristics. Some of the answers 
this paper tried to provide have been 
adapted from methods for task design ( a 
good textbook is /GRIF82/). Because of 
space limitations, the problem of task 
and eraplo' evaluation was dealt with in 
a general way. However, there are job 



159 



9 

ERLC 



162 



analyu ' questionnaires and core job 
d in o n M on anaily^rirs — tocrl^s — svatttsXir- (~s eng— 
chapter 5 in /GRIF62/) as well as methods 
for neasuring motivation levels in the 
areas of task, relationship and 
influence/direction. Specific evaluation 
instruments for members of a software 
development team are basically 
nonexistent. However, the more general 
instruments are quite suitable for this 
purpose if one is willing to accept that 
it xjill require somewhat more effort to 
use it than if one ahd a more specialized 
evaluation instrument. At thi3 time no 
experimental results are known to the 
author which evaluate projects, their 
strenghts and ueaknesses as a function of 
team structure and team member 
characteristics and provide a statistical 
basis for choosing team structures. It 
i3 hoped that through joLa-t efforts with 
industry a data base lik4 this can be 
developed which can further broaden our 
understanding of software development as 
a team effort. 
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SESSION OVERVIEW 
MANAGING END USER COMPUTING 



Thomas N. Pyke, Jr. 

National Bureau of Standards 
Washington, D. C. 20234 

We are now in an environment in which we observe the proliferation of small 
computers due to their rapidly decreasing cost along with continually increasing 
capabilities. Fueled by vendor and media promises, direct access to computing 
resources by end users, including use of micros, is becoming widespread. Many- 
end users are requesting or even demanding increased and improved access and 
relevant support from management. Many in management are encouraged by these 
developments and would like to further them, but are increasingly aware that 
there is a growing need to impose appropriate organizational constraints, i 

The objective of this session is to explore and summarize the issues and 
motivations in implementing an integrated approach to supporting-end' user direct 
access to computing resources. This is done by providing views from three 
perspectives: historical, user, and management. 

The first talk will present a historical view of the evolving information 
resource center (IRC) concept, provide a set of working definitions, and give 
focus to the central themes involved. It will explore, the objectives that may 
be achieved by means of an IRC program and identify those who are likely to be 
the driving forces behind and advocates of such efforts. The talk distinguishes 
between various types of support, including handholding of end users of main- 
frame services and establishing microcomputer support centers. It reports on 
activities underway to integrate such support and implement the various components 
of such an IRC program. It will also distinguish between the "helping" vs 
"controlling" nature associated with an IRC. 
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The second presentation, "Information Center - The User's Answer to the 
Computer Room," will look at support structures from a user's point of view. ... 
As the end users sense the potential of the new technology in improving' their 
individual performance and productivity,, what types of help in realizing this 
potential are they seeking? What kinds of computing resources do they need? 
What help do they need in identifying and evaluating the information 
alternatives available to them? These and other questions will be explored. 

The final presentation "Support Structures: A Management Tool," will advance a 
management perspective on support structures. The view of these structures as a 
vehicle by which management can foster individual productivity on an incentive 
basis and at the same time influence the direction of their use will be examined. 
The role of support structures to ensure adherence to organizational constraints 
(which address such considerations as data integrity, auditability , and security) 
will also be explored. 
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INFORMATION CENTERS: THE USER'S ANSWER TO THE COMPUTER ROOM 
Esther P. Georgatob ; 



Veterans Administration 
Office of Data Management and Telecommunications 
Washington, D.C. 20420 

\ ' 



ABSTRACT: New "Information Centers," which utilize and promote personal com- 
puter use, have gained popularity in many large businesses and are 'now finding 
their way into the Federal Government. Their most interesting feature is that 
the users operate the equipment themselves. While the Centers aren't capable 
of doing the large jobs currently handled by the typical DP department, they 
are introducing the user to personal computing and to irore advanced data pro- 
cessing theories. This paper briefly describes the Informat<on/Center concept 
and discusses the establishment of a Center at the Veterans Administration^ 
Washington, D.C. 
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KEY WORDS: Information Center, DP department, personal computers, users, data 
bases, user needs, Implementation Plan, work environment, Information Technol- 
ogy Center (ITC), staffing, publicity, stand-alone, data manipulation, net- 
working, testing, modifications, office automation / . 



1. Introduction 

About a year ago, C0MPUTERW0RLD ran a car- 
toon which illustrated how the role of the user 
has evolved through the sixties, seventies, and 
eighties. It depicted the DP department in the 
sixties as monarch over the entire computer 
environment. The user was little more than a 
faithful subject. But in the eighties, or so 
the cartoon predicts, the user will be the one 
holding the keys to the computer room — and the 
DP department will be pretty much left out in 
the cold. 

Whether this prediction ever comes to pass 
is arguable at best. But without question, the 
user's growing involvement in his or her com- 
puter needs is a trend that is here to stay. 
In fact, the smart DP departments have already 
begun abdicating their absolute authority in 
favor of more democratic arrangements. 

One of the most successful of these 
arrangements involves the operation of a self- 
contained unit called an "Information Center." 
While its introduction can't be heralded as the 
final solution, it does seem to be an idea 
whose time has come . 



/ n . The Concept 

/ 

An Information Center is an independent 
organization that uses the personal computer as 
its primary tool. While it normally operates 
within the DP department, many users are estab- 
lishing Centers within their own organizations — 
especially when the DP department has been slow. 
to act. 

The Center's main purpose is to provide the 
user with a mean r < to accomplish smaller jobs not 
easi'y handled by the organization's huge com- 
puter systems. Its strength lies in its ability 
to produce a product quickly and easily, and to 
deliver resuhs that almost guarantee user sat- 
isfaction. This third seemingly impossible task 
can be accomplished because the Information Cen- 
ter is unique in one important way: The actual 
work is done by the users themselves. 

A second important purpose of the Informa- 
tion Center is to provide a systematic approach 
to the growing arena, of personal computing. The 
very nature^of the Information Center's imposed 
structure allows users to purchase their own 
equipment, but at the same time keeps them 
within a prescribed structure. This allows for 
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networking capabilities and consistency when 
outside services are used or new applications 
are added. In essence, it puts the DP depart- 
ment in a guidance role rather tl-.in an oper- 
ating one. 

Finally, so as not to forget the user's 
current needs, the concept stilL leaves :he DP 
department with its large global systems, 
entities that meet needs not possible, with 
personal computers. 



3. The Approach 

While the concept of the Information Cen- 
ter is relatively new to the government, it is 
not new to business or academia, Corporations 
have used the concept for several years to bol- 
ster productivity and reduce applications back- 
logs. Colleges and universities have employed 
the concept even longer, having begun making 
their data processing resources available to 
students back in the sixties. 1 

The syccess these organizations have 
achieved has not only verified the validity 
of the concept, it has also helped define the 
approaches Centers are likely ro follow. 

Generally speaking, Information Centers 
fall into two basic categories: The first kind 
provides users with access to the organization's 
da fa bases; the -second uses several types of 
personal computers as stand-nlone units. Both, 
however, have the same responsibilities: 

o Establishing the user-friendly environment 
required of the concept . 

o Selecting whatever software tools are 
appropriate 

o Ensuring that the proper data is acces- 
sible and secure 

o Acting as consultant to the user com- 
munity on an as-needed basis* 

In addition to these responsibilities, most 
Information Centers have taken on the task of 
training the users. This is not only something 
of a necessity, it is a good way of establishing 
good rapport between the DP department and the 
user . _ 

Most Centers also operate a library of 
trade publications, software packages, and 
vendor literature. Others provide space and 
plug-ins for users who want hands-on access to 
vendor systems they are evaluating for procure- 
ment. Still others provide areas where vendors 
can display and demonstrate their products. 



1 "The New Info Centers," DATAMATION 
(August 1983), p. 30 

2 "System Development Mythology, 
DATAMATIONS August 1983), p. 276 



4. The Implementation 



As a means of discussing how a Federal 
agency might go about implementing an Informa- 
tion Center, it was suggested that we describe 
our own experiences in establishing a Center at 
the Vete.ans Administration in Washington, D.C. 

Since there are only a few examples of an 
agency setting up an Information Center Without 
the help of outside consultants, we hope this ^ 
documentation will prove useful to our Federal 
associates who arc considering a Center of 
their own. 



4.1 Inception 

The decision to establish a VA Information 
Center was made initially in February 1982. It 
was decided. that it should be located within 
the DP department, in this case, the Office of 
Data Management and Telecommunications. The 
physical location of the Center was to be the 
VA's Central Office in downtown Washington. 



4.2 User Needs 

Regardless of the agency function,, every , 
Information Center must lay the user's needs as 
its functional cornerstone. These needs deter- 
mine how the Center should be constructed, in 
terms of both equipment and size. 

The needs of our users called for. a combin- 
ation of the two types of Centers discussed in 
Section 3. That is, access to agency data bases 
was required, in' addition to the services only 
stand-alone systems could provide. Equipment 
was thus procured that provided for data base 
accesti, computer graphics and statistical model- 
ling, the display of plotted information, and 
interactive processing of data. 

As for size, the Center not only had. to 
house the equipment, it also had to be large 
enough to meet other user needs, e.g., vendor 
demonstrations, the training of user personnel, 
and the operation of a persona 1 computing li- 
brary. 



. 4.3 Implementation Plan 

After we had define' the user's require- 
ments, we began translating them into an Imple- 
mentation Plan, with the intention of meeting 
the user's highest priority needs first. How- 
ever, we soon ran into several stumbling blocks 
which made a specific, highly detailed plan 
difficult to formulate. 

For instance, all electrical work and car- 
pentry had to be handled by GSA, the lessor of 
the VA building. That meant our renovations 
were completed according to their schedule, not 
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ours> s Consequently, the plan couldn'tbe struc- 
~tui^d--tovm^<l«^life~jG.€riter -s ~o pen-tng-, - *jinc e_tha t 
date was not known. Even : *the staff could not 
be brought together, since the area it was to 
share was still occupied by other employees. 

But in spite of these difficulties, the 
Implementation Plan still proved useful. It 
was the rallying point of all efforts and the 
means by which schedules and deadlines were 
kept track of. Even when delays occurred, we 
knew where we were going — if not necessarily 
when we -would get there. 



4.4 Work Environment 

Naturally, the physical "shape of the Center 
had to adhere to the room in which it was to be 
placed. As can be seen by the floor plan (see 
Figure 1.), our room was large enough, but it 
was rather oddly-shaped. In addition, it was 
poorly lit and in a state of disrepair. Our 
frrst priority, then, was to determine how best 
to transform this space into a pleasant, prac- 
tical working environment. 

We decided that the Information Technology 
Center (as it was formally named) could be 
divided into seven distinct functional areas, 
grouped by use. These included the following: 

o Office automation 

o Computer graphics 

o Library 

o Reception area 

o Time sharing 

o Personal computing 

o Plotting 

The room's odd shape lent itself very well 
to this concept, an we were able to place the 
equipment in separate areas and st'ill have room 
left for the library and reception area. In 
addition, the middle could be lr ft for demon- 
strations , seminars , and training. We also had 
space available for future needs. 

After the floor plan was conceived, atten- 
tion was turned to -nore practical items. To 
begin with, the existing chilled air cooling of 
the room was not adequate,' considering the 
space we had (1000 sq. ft.) and the amount of 
e q uipmerit we ". wo u Id be opera ting; . Consequently, 
new air conditioning ducts were added. A new 
ceiling was also hung, to accommodate additional 
light fixtures. A separate power supply was 
set up to eliminate voltage deviations in the 
laser printer. 

After these matters were taken care of, 
attention was turned to the overall tone of the 
work environment. We wanted to make the room 
light, airy, and as esthetically pleasing as 
possible. Furniture was chosen to re-enforce 
this theme, although comfort and practicality 
were the final determining factor-}. In addi- 



tion, the room was ringed with an electrical 
-cable- raceway , to conceal wires and to keep 
them from under foot. // 

<■'/ 

The resulting effect was one of freedom 
and open space, perfect for creating a helpful 
atmosphere and for enhancing user confidence 
and access . 




Figure 1. Floor Plan of the VA Information 
Technology Center (ITC) 



4.5 Staffing 

The rco^t critical part of our implementa- 
tion was the hiring of the staff. Again, our 
user's needs were the standards by which per- 
sonnel were chosen. Since the ITC is geared 
more toward technical consulting than training, 
a staff was selected to reflect expertise in 
technical areas, including programming, systems 
and statistical analysis, modelling, and tele- 
communications. 

Placing the emphasis on the technical side 
will also allow us to more easily keep abreast 
of the new technologies constantly being evalu- 
ated, modified, and developed by the Center. 
It has also proved useful in setting up and 
maintaining new equipment. v . 



4.6 Publicity " """* """ 

It has been said that wprd-of-mouth adver- 
tising is the best way to publicize a product 
or service, since it of fers the ..advantage of 
instant credibility. But in order for it to be 
effective, you must have a lot of satisfied 
customers. Fortunately, in the few months we 
operated before renovations began, the ITC was 
in constant use, and most importantly, users 
were excited about the concept. From the begin- 
ning, we emphasized that the. Center existed for 
their use, and we still try to get our users as 
involved with its success as possible. 
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However, we also decided it was important 
to*%se a more expansive publicity program, one 
that would blanket all of the user organizations 
and locations. To this end, a brochure was 
written that outlined the ITC's purpose, ser- 
vices, location, and use. We also instituted a 
bi-monthly newsletter, providing the user with 
information on new techniques, new systems, new 
services, and possible problem areas. Finally, 
we wrote articles in various VA publications, 
describing the services offered and the concept 
in general - 

All in all, we have been able to cultivate 
some positive opinions about the ITC, which in 
turn has kept us encouraged and especially alert 
for ways to improve our services. 



We also intend to evaluate, test, and modi- 
fy equipment and software -coming-on-the -market.— 
One of our first efforts will involve creating 
a local area network by linking our microcom- 
puters. This is what we'll use to run our 
office automation system. It is a brand new 
concept and will take a lot of testing and modi- 
fication to make it work. 

Research will also begin on the design and 
implementation of a multi-user microcomputer. 
This project will use products from a variety 
of different vendors and will require the staff 
to find a way to ^link the products into a 
■/nrVahle system. However, it is an important 
c. .ucept, since the result will give us a very 
versatile system, and one that can use existing 
equipment. 



4.7 Current Status 

In less than nine months, the ITC has been 
staffed and trained, renovations have been com- 
pleted, and host connections are in place. Most 
of the equipment is already in use, and the rest 
is on order. While we're not yet totally oper- 
ational, users have already begun signing up for 
demonstrations and training. The ribbon-cutting 
is scheduled for November of this year. 



4.8 Future Plans 

While most of our efforts have centered on 
immediate needs, we have not lost sight of our 
responsibility to accommodate the growth and 
sophistication of the user. A good foundation 
has been laid, and we are looking forward to 
some exciting possibilities. 

To begin with, our users will soon be using 
existing stand-alone units as part of an inter- 
connected network. This will enable them to 
supply their own data and then to create a 
variety of ways to display it. The next stage 
will- find the user accessing data bases through 
telecommunications and preparing graphs and 
other data on-line. .' 

As for the potential of microcomputers, 
users are already employing software packages 

xto manipulate data. Next will come provisions 
for the user to access i a _ data . base , make an 

"extraction, manipulate the data using one of 
the packages on the micro, and send the data 
points either to the. graphics system, the 
plotter, or the micro's graphics capabilities. 

As our users begin buying personal com- 
puters for their own organizations, our staff 
can assist them in configuring their systems, 
selecting software, and helping them with 
problems they might encounter. As the number 
of these users grows, the ITC will sponsor 
user's groups and, a bulletin board service, 
which will interconnect users in Washington 
with those in field stations. 



5. Conclusions 

As we previously mentioned, very little 
documentation exists for setting-up an Informa- 
tion Center in the Federal government. While 
this paper is not sufficiently detailed, to 
provide that documentation, there are a few 
lessons we learned that might help interested 
organizations sidestep some problems. These 
are as follows: 

o Be User Oriented - Establish the services 
you are to offer in line with the func- 
tions of your agency and your user's needs. 
Once this is determined, be single-minded 
in getting the staff, equipment, software, 
and facilities to support your services. 
Also, realize that everything should be 
planned or done in response to an existing 
or anticipated user requirement. Keep in 
mind that «the Center exists to meet the 
needs of the user. / 

o Define and Restrict Your Services - Unless 
there is an unlimited staff, restrict the 
use of the Center to the services you are 
providing. Stay away from complicated 
projects that are part of the normal data 
processing function. 

o Train the User Well - Be prepared to meet 
the needs of the user from an educational 
standpoint. Users will require a lot of 

technical suppnrt, especially iti the begin- ^ 

ning. No pacr.a.ge has instructions so well 
written thrt an untutored user can learn 
without aid. It is also a good idea to 
have initial user instructions taped near 
each piece of equipment. 

o Prepare an Implementation Plan - Time spent 
preparing an Implementation Plan, however 
difficult it might be to follow, is well 
worth the effort. Resist the temptation 
to stray from it. In order to be success- 
ful, all the details must be accounted 
for. In addition, don't forget about the 
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descriptions of services, policies, proce- 
dures, and standards you'll need before - 'j 

going operational. 

For those of you who are already convinced 
of the advantages offered by an Information 
Center, we hope we have expanded your under- 
standing of the concept and perhaps opened your 
eyes to some of the exciting possibilities it 
can offer. 

On the other hand, for those managers in 
today* s DP departments who still think of 
personal computers as over-sized video games, 
you'd, better reconsider. Yours may be the 
monarchy left out in the cold. 



We are especially grateful to Mr. Jack Sharkey, 
Director of the Office of Data Management and 
Telecommunications. Without his support and 
belief in the Information Center concept, the 
ITC could not have proven so successful. 
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AN ORGANIZATION MODEL AND CASE STUDY 
FOR MICROCOMPUTER CPE 
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In this paper, computer performance evaluation is viewed from the 
perspective of assimilating the microcomputer into the organization. 

It presents a way of thinking about problems and answers. It does 
this by presenting a model and some sample components of the model. 

The approach is an attempt to fit sorca microcomputer issues into the 
framework of organization development styles. 

Key words: End user; microcomputer; microcomputer laboratory; model; 
objective oriented management; organization development; organizational 
tensions; productivity; reference system; team work; technology. 
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1 . The Challenge of Hicro Infusion 
Into the Organization 

This paper assumes that the micro infusion 
is not an open and shut case. There seems to be 
agreement that micros ar& fcaing placed in 
organizations at a dazzling pace. However, there 
seem to be valid considerations both supporting 
and opposing aspects of what is happening. And 
there are many differing conditions impacting 
what should be done next. These involve cost, 
system reliability and the self-determination of 
the end-user. 

.... ^ ^This paper exp lor es a way of thinking .about 
infusion questions. A suggested rationale is 
expressed in terms of a model, application of the 
model, and case study descriptions for applying 
parts of the model. 

Before getting to the model, what are the 
kinds of issues we need to look at? .Where the 
micro is a blessing and where it is a curse 
relates to organization objectives in both 
tactical and strategic areas. Tactically, 
acquisition and application to specific tasks 
raise questions of product choices, system 
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planning, and expenditure of resources. 
Strategically, the question is how the micro is 
conceived. as a tool and how it fits into 
organization dynamics. 

It is often expressed that white collar 
productivity has yet to be impacted substantially 
by technology. Equipment expenditures per white 
collar employee are only a fraction of that per 
blue collar. employee. However, is expenditure 
alone enough? Must not/ application of technology 
deal effectively with the success of the 
organization and with the factors of integration 
and of tension? 

How do we pursue integration while 
encouraging tools which foster tension? 
Assimilation of microcomputers into the 
organization relieves some tensions and 
stimulates other tensions. Let's try, in a 
moment, to think of some examples of tension. 
Management of such tensions should direct 
expenditure of resources toward the goals of the 
organization. Micros must be assimilated with 
goals like the following in mind: 

1. Maximization of productivity 

2. Avoidance of waste 

3. Maximization of profitability 

4. Maintenance of data integrity 



You probably think of some other goals. * 

To achieve the goals of the organization, 
integration of the elements of technology, human 
factors, and management action is necessary- 
What kinds of models, can be used for this? What 
kind of thinking is needed? 

This paper suggests the following: 

1 . An organizational model 

2. Examples of integration; illustrations of the 
dynamics involved. _ 

3. Two case studies illustrating constructs or 
the model. We'll discuss "constructs" later; 
for now, they are "organizational tools", or 
"frames of reference," 

2. Tensions in the Organization . 

Can the model we develop accomodate , 
tensions, integration and goal-seeking? My 
examples and constructs are an attempt to . 
apply the model to micro infusion. 

Before describing the model, what are some 
tensions in the organization? First, tensions 
which have led to microcomputer infusion and 
second, what are tensions which have resulted? 
Are there models to suggest how to view these/ 
tensions in a constructive way? 



2.1 Tensions Encouraging, Micro Infusion 



Pressures 



Micrr> 
Promises 



Appeal of computer power 



Application backlog 



Availability 



Need for end user involvement 



Technology skill requirements 



User-friendliness 



Knowledge end-user has of his 
own business ; 



Knowledge of ADP as an 
autonomous discipline. 



Application packages 



End-user need to control schedules 
and resources by tils own priorities 



ADP Operation need to' control / 
schedules and resources 



Control of resources 
and schedules in 
the user area 
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2 .2 Tensions Resulting From Micro Infusion 



Accessibility of the Application for 
Implementation 



Need for ? thorough system planning 
and preparation ; 



Need (A need will 
(exist because 
of the tension) 



Availability of numerous 
application system tools 



Unclear appraisal of quality 



Need 



Promise of ease of implementation 



Training needed for quality 
installations ; 



Need 



Advantages of standardization, 
integration and some uniformity 
throughout the organization 



Advantages of individual end-user 
optimization 



Need 



Need to maintain adequate data 
communication links 



Need to avoid time sharing costs 



Need 



Advantages of integrated 
application software 



Advantages of containing the 
magnitude of the configuration 



Need 



Variety of devices on. the market 



Maintenance of compatibility 



Need 



These needs require work. For example, 
consider the evaluation of integrated, multi- 
task, executive productivity packages. 

One study asserts that to be able to 
distinquish the Important differences between 
"MBA" (from Context Management Systems) and "M-2- 
3" (from Lotus Development Corporation) requires 
weeks of concentrated study even by a computer- 
literate manager. That kind of effort is costly, 
but isn't it also good for the organization? 



Tensions in the work-life can give energy to 
the enterprise • Contradictions , cross-purposes , 
disparities, paradoxes, and exceptional 
observations provide sparks that may lead to 
discovery. Such tensions may suggest that we are 
looking at something the wrong way. It is by 
this route that innovation becomes a tool for 
problem solving. Sometimes undoing faulty 
integrations may be half the game. A pre- 
requisite for originality is the art of 
forgetting, at the right time, what we 
already "know." 
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- - So tensions can be used construc tively 

■ Now how does the model relate to this? 
Modeling is a way of looking at things- It is a 
way of turning noise into information. It 
provides the rationale for action in a mixed and 
changing environment. 

Is not how you look at things important? 
Poincare was Einstein's senior by 25 years. 
Poincare had the essential facts for the general 
theory of relativity before Einstein did. 
Einstein made the discovery because he had the 
way of looking at things. 



~- So we need a way of looking at things which 
accounts for tensions, goals, and frames of 
reference. A model, I suggest, is the 
"Dialectical Organization Model." 

The Dialectical Organization Model provides 
a way of looking at the tensions of micro 
infusion and fitting them into organization 
development • 

In practice, application of the model 
follows a cycle like the following. 



3. The Dialectical Org anization Model 



3.1.1 Recognize the "Squeaky Wheel" 



3.1.2 Review Goals 



3.1.3 Diagnose tensions or inactivity 



3.1.4 Zoom to bring territory 
into focus 



3.1.5 Apply constructs for 

interpretation and action 



3.1.6 Experience and monitor results 



A person with power to have 
results, becomes aware of 
conditions needing attention. 

In order to diagnose the 
degree, nature, location and 
scope of need. 

Determine whether the squeak * 
was from tension or from 
inactivity (equilibrium). 

Select object area. If in- 
activity , zoom to expose. If 
tensions, zoom to integrate. 
Aim is to prepare opportunity 
to achieve growth toward goals. 

Reference, systems, beliefs, 
laws, principles, perspectives, 
criteria, environmental 
frameworks, pictures, models. 

Progress toward goals 
organizational learning, 
growth, integration, new 
tensions. 



TENSIONS 

GOALS 

DIAGNOSIS 

ZOOMING 
INTEGRATION 

EXPOSURE 
CONSTRUCTS 

GROWTH 
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3.1.7 Go to 3- 1 *2 

Above is the operational sequence of the 
model. I'm sure you have experienced the 
processes . 

Tensions provide. the A^A^ J?. 1 !?^ 3 . ' 

Goals provide the sense of direction. 
Zooming focuses, as with a camera. 
Constructs provide the rationale. 

Integration is sometimes the productivity 
result, and sometimes new tensions are. 

Growth is where value is achieved. 

The dialectical sense of the process is 
made meaningful by the goals and constructs. 



You may visualize a number of ways these 
processes fit the micro world. 

U. Examples of Integrations 

"One rof "tire -noticeable outcomes that can * 

result from applying constructs to the tension 
resolution cycle is integration. You've seen 
micro situations where integration is 
appropriate. Here are examples of organizationa 
integration emerging from out of the micro s 
context. 



The four examples I have picked are: 

V. Fact-based management 

2. Manager computer literacy \ 

3. , The pro-user \ 

4. Re-defined information 
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4. 1 Management Methods 

Contribution of Micros to Fact-baaed 
Objective Oriented Management 

Perhaps you have observed ways in which 
micros impact management approaches. With 
spreadsheet manipulation and graphics, it is 
possible to have more immediate interpretive 
access to factual information by decision- 
makers . 

This is an example of the way in which a 
technological development fits with a management 
style. . Accordingly, it becomes integrated into 
the behavior of the organization. * 

As other territory is brought into focus, 
constructs such as Decision Support Systems-, 
Matrix Organization, and Zero-Based Budgeting may 
use the micro as a vehicle. In each case, 
integration occurs. 

Each individual who proposes alternatives 
will be supporting them with readily available 
data. The spreadsheet and graphics, then are a 
fit for other non-technology dynamics that are ' 
occurring in the organization. ' 

The idea that there is a fit does not mean 
unchallenged harmony. Integration is not 
tranquility, but is growth errupting out of 
foment that makes sense by the rationale of the 
model. 

Likewise, there will be such problems as 
reliability of statistics froni remote files down- 
loaded from a data base. v But the point here is 
that the support that is marshalled for an 
alternative is based upon prototyping that an 
individual has done at a keyboard with a 
financial model. 

4.2 A Cost Incentive for the Natural 
' Cultivation of a New Generation of 
C omputer-literate Middle Managers 

Another illustration of integration may be 
management computer literacy. For years 
automation proponents have looked for the day 
when end-users would be qualified to see systems 
from .the perspective of ADP_coraponents^. ::: To some -. 
this did not seem to be imminent until a 
generation of managers is replaced by a 
generation of graduates who studied business 
courses which included ADP. Even if suitable on- 
the-job training were available, how could that 
much training occur for an entire generation of 
managers? 

It now appears that such will happen as a 
grass roots process. 



Where economics spurs the acquisition of 
micros, users will be prompted to become 
technically functional with them. Whether they 
will universally be exposed to good system... 
development procedures or not, they will at least 
achieve a conversant level of functionality. 
Individual training will be demanded. 

Hence a compatible goal will be achieved 
almost as a spin-off, or as an itch that gets 
scratched. 

4.3 Emergence of the "pro-user" 

Dissipation of the Wedge Which 

had been Driven Between the 
ADP Technician arid the End-user 

Ever since the ADP professional became aware 
that users exist and began to call them "end- 
users" there has been a polarization between 
them. It may be time for this distinction to 
evaporate. Would this not be an example of 
integration? Similarly as Alvln Toffler sees the 
roles of the consumer and producer merging into 
the "pro-sumer", perhaps we also are seeing the 
emergence of the "pro-user." 

A "pro-user" may be anyone who uses computer 
products to process and access the information he 
wants. 

The previously labeled "end-user" may become 
more skilled with his new tools than is the ADP 
professional. Intensity of computer related 
skills may no longer be a role distinction 
between these groups. 

Everyone will be managing information for 
organization goals. And everyone will have to 
play keep-up with the technology, regardless of 
whose turf he is functioning in. 

This can be a new treatment for 
provincialism in the organization and a case of 
integration. 

4.4 Re-defining Information 

In response to the question, "What is 
information?" I'm sure you are developing some 

new -answer s H ow ~we "are ~eflfflrT<3flfig~ our~vrew~of 

what information is speaks clearly of 
integration. At one level it emphasizes the 
&?.obal scope of concerns. At another level it 
speaks of the character of the organization which 
Is defined by its information. 

Far from the early ADP view of data as 
r .re;! tactions only, information now comes from 
the PC. the terminal, the word processor, the 
mailbox and the office cluster. 
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Information can be as follows:' 



Corporate data 
Local ADP data 
Research data 
Processed data 
Extracted data 
Administrative data 
Decision SupportKdata 
Word Processing data 
Operant data 

The data may not be different. Its usage 
is different and has meaning throughout the 
organization — and when one makes interpretations 
that run throughout the organization, that 
suggesjts integration. 

This may sound like the archaic, M.I .S. 
promises. The difference is that M.I.S., presumed 
that one could define in one mega-plan a super 
definition of all data in the organization. The 
dialectical/construct model does not assume that 
static view. It works with process. The micro 
allows us to re-define information. Such re- 
definition can be at the service of the 
organization. Accordingly; integration comes by . 
applying new constructs to old^data* Thus, 
integration is based upon flux; not upon static 
definition. The process of zooming, construct 
application, and goal orientation encourage that 
flux to support the organization's productivity 
and profitability. 

In addition to its global nature, the new 
information reflects the character of the 
organization. All of industry, government, 
economics, civilization and culture depend, upon 
the symbols of information. What makes the 
differences among the Super Bowl, the Mardi Gras,. 
a high school commencement, the Miss America 
Pageant and the Democratic National Convention? 
Isn't a lot of it, what data is taken to mean and 
how it is processed? 

As more job functions in the organization 
become knowledge-defined, and information- 
constituted, the character of the organization 
becomes more and more dependent upon those ' 
information functions. ' 

Similarly as the DNA code is a constituent 
continent of "the make-up of an organism, 
information is a constituent component of the 
organization. In that sense, the organization 
is information. 

So, the way in which information is 
manipulated by the many end-users impacts 

integration of the organization. This 

manipulation is supportive for the organization 
insofar as it is directed by the dialectical 
model dynamics. 

We have noted that the dialectic can employ 
integration in the example areas of management 
methods, literacy, the pro-user and re-definition 



o'f IhformatToh." 

The tools used for accomplishing those j 
integrating processes are the "constructs. 11 I am I, 
now going to discuss two examples of constructs ! j 
from current organization experience. ^ ! 

5. Examples of Constructs <• 

I have mentioned that constructs can be . \ 
reference systems, beliefs, laws, principles, 
perspectives, criteria, environmental frameworks, ■ 
pictures and models. 

The two case study examples have all these 
elements, but can most clearly be considered 
environmental frameworks with reference system 
under-pinnings . 

The construct' serves as a tool for the 
dialectic modeling* process. The construct 
provides the ground rules s6 that participants 
know what §ame they are playing, so that actions 
have continuity^ and so that inappropriate 
actions are not taken* 

/ 5^1 The Micro-lab 

A con/truct I would like to suggest as a 
case studV is the use of a microcomputer 
laboratory. Sometimes a construct is a stated 
philosophy. In this case, the lab is not a 
philosophy, but it represents a philosophy. 
Basically it is the philosophy which treats the 
end-user like an adolescent who is going to 
college. Rather than controlling him, you give 
him some new tools and trust his background to 
carry him on from there. The lab itself is an - 
"environmental framework" in the sense that it La 
an organizational entity. Its existence 
contributes to the execution of the dialectic 
model. At the present time, at the State of 
Missouri, we are in the first months of 
establishing such a laboratory. 

The context for the lab is a\n environment in 
which numerous divisions will be considering 
acquisition and application of microcomputers. 
The division sponsoring the lab offers to be of 
service to the many user divisions. The sponsor 
division also is interested in the overall health 
of ADP enterprises* in the user divisions. As far 
as " authority~ls"concerned , "the ruser~ divisions "are" 
essentially autonomous with regard to 
microcomputers • 

Micro proliferation is seen as both an 
opportunity and a possible danger. Its main 
strengths are its weaknesses. Its rapid payback, 
responsiveness, accessibility to "non- 
professionals", freedom from constraints, and 
piece-meal decision Items suggest to many 
observers the need for coherent multi-division 
leadership and planning mechanisms. 

The dialectical construct model suggests the 
micro laboratory as an approach. 
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First of all, the model suggests that we ask 
the right question. Some might be tempted to 
have asked, "Who is right, who is wrojig?" or "How 
does an organization control an elusively 
acquired resource?" or "What resource type, Is 
cost beneficial? " 
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Rather than those kinds of questions, the 
model would suggest the question, 'ttow do we 
contribute to organizational growth by addressing 
integration and tension factors?" The lab 
encourages access to the tools needed to 
accomplish integration like the four forms of 
integration that we discussed: in the last 
section (4.1 - M.M). 



This is done by giving end-users an 
opportunity for hands-on experience, education, 
assistance, prototyping, comparing, project 
magnitude assessment ,/ demonstration and 



experimentation • 



/ 
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The primary service is access to operational 
resources, both hardware and software. The 
posture of the lab is to adapt to user 
functions. What 7 one user cultivates is to be 
shared with others. So, the- climate arises out 
of practice. ' Likewise, skills that are found to 
be needed are cultivated and addressed with 
training aids. A variety of hardware and 
software is maintained to offer the user the 
primary options that others in the organization 
are finding helpful. 

It is intended that divisions will benefit 
in their own system development projects, 
planning and acquisition of products. 

Exposure to the lab may or may not lead to 
uniformity of some kind across the multiple . 
divisions. The philosophy is that users are self- 
directing, but should have advantage of the 
experience of others and exposure of major 
resources that others are finding useful. 

How was the micro lab suggested v by the 
dialectical construct model? 

On the surface it may have seemed that we 
.Jaad .a. technological~or -an- economic*- issue. In 
fact, it was an issue of organization dynamics 
and the solution is an organization development 
solution. 

The lab reflects the service posture of the 
sponsor division. This- is part.of the 
environmental framework and reflects an organi- 
zational viewpoint. 



It is not a territorial power kind of "move. 
It is a contribution to a global learning 
situation. 

The lab will be used where the need arises. 
Thus, in response to tension, its use becomes a 
part of the dialectical cycle. 



Organizational learning is the result. And j 
organizational learning is an experience of 
people and agencies who modify their approaches 
as they discover new results. 

5.2 TEAM Development of Organizational 

Information Management Services h ! ! 

Another construct example comes from an 
experiment with "teamwork." 

The context for this construct has been the ^ 
need for definition of organizational functions, 
services and projects. Its most major use was 
during a time of reorganization transition. 
Further use will be primarily , for brainstorming 
and joint planning in new project areas. 



Pre-conditions 



■ j 
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The reorganization transition was a phaj 
during which several staff members were iiviiew 
positions, the division director wasj^w, and 
partial reorganization was aroun^tne corner , j : 
The "Team" involved eight (8)^eople in a 
planning and coordinatin^sfection. This section 
is part of a divisiorv-or 140 employees. The 
division, in turn^performs services for 
departments wi-tn several thousand employees. The : 
scope of^fche'Team effort was primarily ^within its , 
own ^area of responsibility. The nature, of ■ that, . 
however, extended concern into matters of 
division . level functions . 



( 



The Team's section had responsibility areas 



which were relatively easy to adjust 



by 



management discretion. The new director had 
conceptual interests and skill which were 
adaptable to the organizational dialogue which 
was emerging. The staff was vocal. \ The director 
was ready to hear and to interact. 

The staff had interest in discussion about 
goals and approaches. The new director had 
interest in reviewing current practice and 
forging new directions. 

Approach 

A" staff member suggested a one-day 
brainstorming session at a remote site in a 
retreat setting. The suggestion was accepted and 
eight (8) people met spanning three (3) le/vela of 
supervision. ' ■•■ 

The agenda included presentations and 
dialogue about what was being done and what was 
recommended* r 

Near the conclusion the director delivered 
the gauntlet, saying in effect, "You are a team, 
carry on from here in your own way — and there is 
something I want as output. I want your 
recommendation as to the criteria for the 
services this division should be offering." 
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This was, in. effect, an 'invitation to start 
at ground-zero for what the section should be 
doing. 

The Team began weekly meetings with rotating 
chairmanship. Throughout its functioning there 
were expressions from some indicating desire to 
be told more about what was expected* Others 
encouraged moving ahead. The director insisted 
on keeping hands-off. 

Results 

The weekly meetings were held for four 
months, with substantial individual and 
subcommittee work being done between meetings. 
Formal documentation was developed. 
Recommendations were made of functions the 
division should be performing. "Reorganization 
occurred. New major directions were 
established . 

Under the reorganized structure some' members 
were uncomfortable with continuing Team 
activity. Other members wanted to continue. 

Management decision was to comprise a 
continuing Team of the new supervisors. The 
total 'staff would function as a team on an "as 
needed" basis. 

Relationship to the Model 

The dialectic application 'of constructs was 
experienced throughout the Team experience. 
There were repeated cycles of zooming for tension 
and for integration. An interesting thing abdut ^ 
that is that what the director was wanting to see 
was process. He wanted to see those three levels 
of folks hammering out ideas in dialogue. And 
, that is a natural for the dialectical model. 

I will mention here some tensions' leading 
to the Team idea, some tensions resulting and 
what the contribution was to the division. 

Tensions Leading to use of the Construct 

A. ta ck of 3en3e of idenfcifcv b ^ individuals with 
aSfc stated objective of the division. 

B. A variety of unstructured and undefined 
viewpoints about what the division., was doing 
or .should be doing. 

v 

C. Uncertainty about roles, expectations,, 
and responsibilities* 

^ Tensions Resulting from Use of tie Construct 

A. Where expectations of some members of the 
Team (depending upon their perception of -> 
objectives) became fixed, they felt a lack of 
follow-through. 



B. Where the Team idea reflected a participative 
management style, sometimes staff liked being 
included; sometimes staff did not like the 
responsibility, the loss of targets for 
complaints and the fear of being penalized 
for expressing views disfavored by 
supervisors present.,. 

C. Where authoritarian management style 

moves re-asserted themselves; sometimes staff 
were relieved at being led again and at 
having problems solved by someone else; 
sometimes disliked being told what to do when 
left out of the decision process. 

Contribution of the Experience to the Division 

The team experience has. been an- example of 
the state of flux that often characterizes the 
dialectical process. There has been a lot of* 
give and take; a lot of ambiguity; "and- some solid 
ideas that benefitted from the scrutiny they had 
received. And the team approach exhibits the 
thematic, methodological traits of a construct. ' 
There has been organizational learning and growth, 
in the sense of staff working together in a new 
context. Clarity of work has been evolving. 

«. 'There has been progress toward goals. 
Emerging out of the teamwork, management has made 
thematic and procedural moves toward re- 
definition of what the division does. 

The idea of the Micro-lab came out of the 
Team process. . 

The microcomputer laboratory and the team 
process provide interpretive frameworks in a 
fluid situation. They contribute structure that 
is not static. Such is necessary for constructs 
in the dialectical\rocess. 

6. Ntfrap-up 

As microcomputers are infused , into the 
organization, parochial interests., run head-on 
into global concerns. 

The microcomputer has appeal as it is made 
responsive to organizational tensions. Its 
"Infusion results in new tensions. Application of 
constructs may bring integration or may surface 
new tensions if equilibrium needs attention. 
Constructs like the lab and the team can work 
with the tension cycle for growth and learning in. 
the organization. Results like fact-based 
management, computer literacy, and the pro-user 
may reflect. integration within the cycle. 

The model advocates fluid processes. It 
depends upon change. Likewise it should be 
applied to embrionic areas such as integrated 
application, packages on the micro and to end-user 
application development. - 
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SESSION OVERVIEW 
FEDERAL MICROCOMPUTER ACTIVITIES 

Alien L. Hankinson 

National Bureau of Standards 
Washington, D.C. 20234 



Microcomputer-based .systems represent a new frontier in 
information technology. These systems offer the potential for 
Federal agencies and other; information intensive organizations to 
improve substantively the productivity of the workforce as well 
as the quality of the services being provided. 

These organizations face a stiff challenge Jo realize the 
potential offered by this new technology. A major problem is that 
the technology is moving so fast that it is difficult to get a 
handle on it's effective management and use. 

Some of the critical issues that must be addressed include: 



when should these systems be considered as 
alternatives and/or adjuncts to central facilities- 

what are the key steps in the process that should be. 
used in the selection of these systems; V 

how shouid'these systems be configured to enhance 
the sharing of data and the shar i-ng of expensive 
peripherals; 

how should applications software be acquired in 
order to minimize development and maintenance costs; 

how^should this technology be "packaged" to meet the 
needs of nontechnical end-users. 



In this ses/sion, Federal microcomputer activities that 
involve these and other issues will be explored from three 
d if f erent per spec t ives: 

• from the perspective of an organization which 
develops Government-wide acquisition programs; 

• from the perspective of an organization which 
develops Government-wide technical standards and 
guidelines; 

• from the perspective of an organization which has 
already taken some innovative step, to deal with 
these issues within a Federal agency. 
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DATA PROCESSING USER SERVICE - A PROBLEM; : A PROPOSED SOLUTION 



Thomas H. Acklen 

Veterans Administration Data Processing Center 
1615 East Woodward Street 
Austin, Texas 78772 

While data processing technology continues to progress at an ever 
increasing pace, techniques employed by data processing organizations to 
extend these technological benefits to users remains basically static. Users 
are buffered by analysts and programmers from the equipment s capabilities. 
Maintenance of existing systems places a growing burden on the data processing 
organization, and the development of new appl ications -is people-intensive and 
protracted Users a^e becoming more dissatisfied with the data processing 
organization's inability to respond to their demands. \As a result, users are 
in many instances, trying to use microcomputers to fulfill their own needs. 
Such maverick efforts, although potentially beneficial \to a particular user, 
could introduce disarray into an organization's efforts to establish 
integrated information systems. 

' This paper proposes a series of actions which should improve the data 
processing organization's ability to serve the user community. The underlying 
strateqy of these actions is to enable users, within the context of an overall 
data processing plan, to provide for many of their own needs thus permitting 
the data processing organization to concentrate on the more complex tasks. 
The overall objective is to insure more responsive and less costly data 
processing support. 

Key words: Communications networks; data manipulation capabi 1 ities; data 
repositories; programming productivity aids; responsiveness; software 
improvement plan; systems development methodology. 
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1. Introduction 

During the late 1830's and early 1840's 
artist/inventor, Samuel Morse, perfected the 
telegraph and devised-^, standard code for 
transmitting messages electronically. And thui> 
began the communication revolution and the modern 
day need to improve user service. 

The user that wished to send a telegraph 
message had two available options. Under the 
first option, he relayed his rressage to a 
telegraph operator arid requested that he code and 
send the message to the desired destination. The 
operator at the receiving station decoded the 
message and relayed it to the specified 
individual. If all worked well, the telegraph 
operator (s) : 

- (1).. understood the user's message, 



(2) properly coded it, 



(3) routed it to the proper destination, 

(4) correctly decoded it, and 

(5) delivered it to the intended person. 

Under the second option, the user could master 
this technological wizardry that required him to 
speak in dots and dashes, find an operator that' 
would allow him access to a telegraph key and 
send his own message. Neither approach could be 
candidly described as "user friendly". 

With the invention of the telephone, 
Alexander Graham Bell radically altered the 
electronic communication user interface and the 
method and type of support provided the user. 
The user became the direct conveyor t of the 
message. Nevertheless, in the early days of the 
telephone, responsibility for routing of the 
message still rested with the equipment 
operator. Today, the user of a modern telephone 
system both routes and conveys the message. 
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Although there are exceptions in today's 
data processing world, user support is about 
equivalent to that which existed in the early 
days of the telegraphy In order to benefit from 
the computer's capabilities, the user must either 
depend upon data\processing technicians or become 
a data processing technician. Inherent in such 
an environment are costly delays,, poor service, 
and dissatisfied users. v 

The challenge for today's data processing 
manager is to place the analogous telephone in 
the user's hand. First, he must through a 
combination of tools, techniques, training and 
trust, eliminate when possible the analysts and 
programmers that buffer the user from the 
computer ' s capabi 1 ities, thereby al lowing the 
user to directly utilize this technology. 
Second, he must strive to improve the traditional 
user/analyst relationship and the support 
provided the user through this interface. This 
paper proposes a series of actions designed to 
assist the .data processing manager in meeting 
both objectives. 

2. The Problem 

Our industries and government institutions 
can ill afford the inefficiencies associated with 
today's multi layered, people-intense methods of 
data processing systems development and 
maintenance. As the unit cost for computer 
hardware has plunged, the labor cost associated 
with application software development and main- 
tenance has skyrocketed to the point where it 
accounts for the vast, majority of an organiza- 
tion's data processing budget. Even if these 
institutions were willing to commit the requisite 
dollars, our labor market would be hard-pressed 
to provide the number of analysts and programmers 
required to meet rapidly expanding dat^ 
processing demands. The U.S. Department of Labor 
forecasts that between 1980 and 1990 the need for 
programmers will increase by about 40% while the 
demand for systems analysts will climb by 50#. 
It holds that such rapid increases in the demand 
curve will cause a continued increase in labor 
costs and an overall decline in the availability 
of data processing talent and skills. 

Uur institutions face the additional threat 
that frustrated us- >, aided by the ubiquitous 
microcomputer, wil. re;Hiject into the data 
processing environm; -t (1) nonstandardized design 
techniques, (2) du r, icative development efforts, 
and (3) parochi-. attitudes regarding the 
gathering and sharpy of information. These are 
reminiscent of the vices that the data processing 
world has struggled to free itself from for the 
past ten years. This is not to imply that such 
unilateral efforts may not benefit the particular 
user. However, the nonstandardized and 

undisciplined rethods of these "computer 
i 1 1 iterates" may, in many ways, do our 
institutions a disservice. Our ability to 
effectively plan and integrate data processing 
services for our organizations, and to devise and 
implement systems that share corporate data, will 
be impeded by these maverick systems. 
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3. The Traditional Solution 

Reasons for change in our current approach 
to users support are readi ly apparent; however, 
the avenues to such change are far less clear. 
In selecting an avenue, it is extremely important 
to understand the particulars of the service 
provided and the specific problems which a user 
has with that service. An often-voiced problem 
with today's data processing service is lack of 
responsiveness. Changes needed by the user 
yesterday cannot be provided for six months or 
longer if, because of other requirements with 
higher priority, they can be provided at all. 
Another frequent complaint is that the changes, 
once finally made, are not what the user really 
wanted or needed in the first place. 

Traditional ly, we as- data processing 
professionals have attempted to address these 
problems by using the approach which we best 
understand, "Give me more—more analysts to 
better evaluate what the user needs and more 
programmers to more quickly code programs that 
meet these needs." We work more closely with the 
user in the early systems design stages to better 
understand the changes that he desires. But this 
technique has yielded basically unsatisfactory 
results because its /underlying premise is that 
the user knows what tie needs to better do his job 
when in many cases /he does not. We reason that 
by establishing larger programmer staffs we may 
(1) more rapidly /toaintain aged and often poorly 
documented systems^ and (2) write more programs to 
meet the new / demands. However, constant 
maintenance of / old code becomes increasingly 
difficult and the results less acceptable to the 
user and the demand for new code outstrips our 
/ability to write it. In summary, we use more 
resources to do more of those things which have 
proved marginally successful at best. 

4. A Possible Alternative 

Given that the traditional approach to user 
support has proved inadequate, another which is 
more responsive to and more supportive of the 
user must be formulated. Such an approach should 
embrace all phases of the application systems 
life cycle from identification of a need through 
the design, implementation, maintenance, and 
finally redesign of a system that fulfills that 
need. This approach should: 

1. Stress the maintenance of corporate data 
repositories which are readily 
accessible to authorized users. 

2 . Feature a data communications, network 
which allows for the effective gathering 
and dissemination of corporate data. 
This network should be organization/ 
function oriented rather than applica- 
tion oriented. 

3. Afford the user simplified yet powerful 
' data manipulation capabilities. 
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4. Emphasize the self-service approach to 
data processing. It should assist and 
guide the user so that he may fulfill 
many of his own data processing . 
requirements within the context of a 
company-wide data processing system. 

5. Recognize those tasks which are best 
performed by the data processing shop, 
andNji those cases, use analytical tools 
such demonstration screens and simple 
data manipulation capabilities to help 
both the . data processing shop and the 
user Jn better understanding what the 
user requires. v . 

6. Feature a "minute man" team of vop-notch 
technicians that can rapidly respond to 
unforeseen, priority user needs. 

7. Employ the latest programming productiv- 
ity aids ta make that application 
programming which is done by the data 
processing Organization less costly and 
less time-consuming. 

8. 'incorporate a^ - master software 

improvement plan designed to gradually 
free the data processing organir.atior 
from the costly and time-consuming 
burden of maintaining poorly documented 
and obsolete application prrgrams. 

9. Utilize a highly disciplined system 
development methodology for all new 
applications to insure that they a» _ e 
more standardized, maintainable^ and 
responsive to the user. 

The philosophy underlying this approach is a 
recognition that the data processing organization 
cannot effectively or efficiently respond to all 
user demands. Its objectives ere to (1) maintain, 
an integrated data processing and information 
sharing capability within the organization, (2) 
allow the users to assume responsibility for some 
of their data processing requirements, and (3) 
permit the data processing professional to more 
effectively accomplish those tasks which cannot 
be readily assumed by the user. 

The approach de-emphasizes application 
programing by the data processing profes- 
sional. It is well to note /that at the state-of- 
the-art will not allow the elimination of this 
fundamental task. As we progress beyond some- 
what primitive, natural language programming 
capabilities toward artificial intelligence and 
voice recongition capabilities, the need for 
data processing professionals to create 
application programs will diminish. However, 
this approach concentrates on that which is 
achievable today. A more detailed examination of 
each component of the approach may provide 
insight as to how it ma> ::e adapted to your 
organization's particular needs. 
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4.1 Provide Ready Access to Corporate Data 

Data, regardless of why or by whom it was 
initially gathered, is potentiall. valuable to 
other organizational elements. Within the 
context of data processing, information has 
traditionally been viewed as the prr.f^rty of the 
gatherer and stored and accessed in a manner that 
primarily benefited that individual or 
organizational element. The. owner (gatherer) of 
that data often resisted its use by other 
organizational elements. 

This perception^ must change. The concept 
that data is a corporate resource to be made 
available to all authorized users for the overall 
benefit of the organization is fundamental to 
effective information management. This concept 
can best be achieved by creating a data 
administration function which reports to upper 
management, preferably the executive officer of 
the organization. In the area of information 
management, both the data processing shop and the 
organization's business elements, should B-2 
subordinate to the data administrator. This will 
insur-e-tha^I^sers-, ire better served. 

The data administrator function can only 
remove the organizational impediments to data, - 
access. The technical impediments can best b6 
addressed by designing application functions 
around a database maragement system. A modern 
commercial database management system will permit 
your organization to' share data and manage 
information. Such packages afford far greater 
flexibility in the storage, sharing, ..arid 
maintenance of data than' was possible under the 
\lat f^e concept of dat,a storage. Organization 
s4^niiar^i^nd-T50-tTries wfill be required to insure 
that data is stored in such a manner that it can 
be easily retrieved and manipulated by authorized 
users. Without such standards your database 
management system will be only a sophisticated 
data storage and retrieval tool. By estab- 
lishing and enforcing standards the data 
processing shop will ensure the existence of an 
information repository capable of serving, many 
corporate users. 

4.2 Implement an Information Communications 
Network 

To be useful data must be gathered, 
manipulated, and/or disseminated. Traditionally, 
we have designed gathering and dissemination 
capabilities each time we developed the data 
manipulation capabilities of a particular 
application. However, in. order to be responsive 
to users, it is, far more effective tq establish a 
common data communication network between the 
data processing shop and those remote locations 
where data originates or is disseminated-. Do not 
expend resources developing a network for payroll 
data or sales data or inventory -management. 
Instead develop an information communications 
network. Use a modular design that allows for 
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expansion. Select a widely used network protocol 
that can support various types of terminals and 
processors.- Such a common data link will 
facilitate rapid development of more effective 
user applications. 

We, as data processing technicians, often 
look at an information system and compare the 
merits of centralized versus distributed 
processing. We must also begin to think in terms 
of centralized versus distributed development. 
We must not forget that a well-planned 
information communications network facilitates 
both. 

4.3 Provide Users Simplified Data 
Manipulation Capabilities 

Under this approach users must be provided 
software packages which allow them to easily and 
economically retrieve and manipulate data. Most 
commercial database packages afford such 
capabilities. Software packages such as SAS, SAS 
GRAPHICS, DYL-280, SPSS, are also designed to be 
used by the data processing novice. Furthermore, 
new product offerings of this nature are being 
introduced almost daily. Consequently, it is 
neither prudent nor economical to have your 
systems analysts attempt to understand and plan 
an application for those tasks that users, if 
given the proper tools, can provide for 
themselves. 

4.4 Support the Concept of Self-Service 

Simply acquiring such capabilities will not 
insure a more satisfied user community. Your 
center's capabilities must be marketed to the 
,users. Additionally, key people within your 
organization must be convinced that allowing 
users to provide for their ..own needs is sound 
data processing management. 

Establishment of an Information Technology 
Center to introduce users to (1) your center's 
capabilities, (2) equipment which can be used to 
access your center, and (3) the basic skills 
required to benefit from the support your center 
offers is an effective mechanism for marketing 
the center's capabilities. Demonstrating the 
capabilities of microcomputers in accomplishing 
small tasks will aid in familiarizing users with 
data processing. Also demonstrating the short- 
comings of small computers in rapidly accessing 
and manipulating large volumes of data is 
important in marketing the concept of fully 
integrated systems. Teaching users how, if 
certain standards and conventions are followed, 
microcomputers can work in unison with your 
center is mandatory if the concept is to be 
achieved. 

Once users are sold on the idea of self- 
service, your organization must be prepared to 
offer ongoing support if they are to be 
satisfied. A mobile t?am of skilled technicians 
that can, upon request, visit a users shop to aid 
them in their tasks will be required. Your shop 
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will have to guarantee prompt and reliable 
service during the hours users require it, i.e., 
guaranteed service levels. 

Gaining user interest may be easier than 
gaining key data processing personnel support for 
the' self-service approach. Because of legitiir., ; 
concerns for the service provided a r d a linger ii*- 1 
concern for job security, many managers within 
the data processing shop may oppose allowing 
users access to data and data manipulation 
capabilities. Their opinions must either be 
changed or nullified; the former, being preferable 
to the latter. Through example and exposure 
these managers must learn that (1) users can 
effectively master basic automatic data 
processing skills and accomp • h Jata processing 
tasks on their own, and (2) . as eliminating 
the need for ? telegraph key ,.^ator did not 
eliminate jobs in the communication' industry, so 
will allowing the users to aia . elves not 
threaten the job security of the a.:.* processing 
professional . • 

4.5 Employ Progressive Analytical 
Techniques 

Although extending computing capabilities to 
the users will do much to satisfy tneir data 
processing demands, it will by' no means replace 
the requirement that your organization develop 
the more complex automated systems that support 
your organization. In effectively perfoming 
this function, it is critical that the data 
processing professional understand what the user 
wants and needs; that is, ensure that the system 
designed and implemented is the system that wiil 
meet the organization 1 s needs. As previously 
discussed, this can be a very illusive goa 1 since 
users generally are not fully aware of their 
needs nor of how such needs c?n best be fulfilled 
through the use of computers. 

Corntjctr applications contain three basic / 
function (1) data gathering, (2) data manip- 
ulation, a fX d (3) information dissemination . 
Failure ut' the latter function equates to failure \ 
of the system in fulfilling a particular business v 
requirement. Thorough under standing of user 
requirements with regard to the latter function 
will dictate an effective system design. Systems 
analysts using terminals, sample data, and 
variable screen formats can learn, along with 
users, what information is needed to fulfill each 
business function. This approach allows the user 
to see how the application output will look, 
evaluate whether it meets ..the organization's 
needs, and if not, suggest and shortly thereafter 
review proposed changes to the output. Thi b 
approach will avoid much retrofitting often 
required in the latter stages of a system 
design. It will also more effectively guide the 
analysts in determining what data must be 
gathered and how must it be compi led and 
.manipulated. The final result will be a system 
which satisfies organizational requirements. 
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4.6 "Minute Man" Response for 
Unanticipated Tasks , 

Proper planning is critical in order to 
provide effective service to data processing 
users. However, data processing tasks will arise 
which were simply unanticipated. Such unantic- 
ipated tasks may be time sensitive and critical 
to the successful functioning of your organiza- 
tion. Forcing such tasks into the routine 
program development/maintenance workflow is 
disruptive and often counterproductive. 

" If an unanticipated task is both time 
sensitive and critical to the. organization's 
function, it should be referred to a "minute man" 
team for execution and implementation. Even 
though it represents a substantive investment, 
the ''team nucleus, composed of an aggressive 
manager and two or three top-rated technicians, 
should be a permanent organizational element. 
Depending on the nature of the task, this team 
should oe able to draw selected talent from other 
organizational elements. Upon completion of a 
particular task, maintenance of the resulting 
system should become the responsibility of the 
cognizant line organization, borrowed talent 
should return to their respective organizations, 
and the team nucleus should begin preparing for 
the next task. 

Benefits of this approach are: 

(1) the assembled team will appreciate the 
urgency of the task, 

(:) .. the talents applied to the project are 
those required to rapidly accomplish it, 

(3) disruptions to other projects are 
minimized, and 

(4) the needs of the organization are more 
rapidly served. 

But r -member the task assigned to such a team 
should &r ^anticipated , time sensitive , and 
critical . use of the team must be the exception, 
not the rule. 

4.7 Rely on Prcductivity Enhancing Tools 

Productivity aids can be used to reduce the 
time and cost of developing and documenting 
systems. r >uch tools range from natural language 
programming packages to programmer work stations 
that, suppor : interactive compiling, editing and 
debugging. Use of these tools can, given the 
particular task, more than double programmer 
productivity. 

A clearing house function should be 
established within your shop to review all user 
requests that require programming to determine 
whether coae-generating tools can simplify the 
task. Once such a determination is made the task 
should be assigned to the cognizant project 
manager. The clearing house function removes 



this decision from the project manager who, 
because of familiarity, may elect standard coding 
techniques over a more advantageous code- 
generating package. Whenever possible, 
management's objective in this area should e to 
supplant people-produced code with that generated 
by machine. \ 

4.8 Aggressively Pursue A Software 
Improvement Plan 

If your data processino shop is 
representative of the industry norm, 70 to 80 
percent of your programmers 1 and analysts' time 
is spent maintaining existing code, consequently 
leaving few resources for new efforts. 
Furthermore, your organization can afford neither 
the dollars nor the time "required to completely 
redesign/replace those systen;s that require such 
heavy maintenance. Because of large capital 
investments in old application code, your 
organization is in a "darned if you do; darned if 
you don't" dilemma. 

A well thought-out software improvement plan 
will allow you to salvage a lar9e portion of your 
capital investment while reducing overall 
maintenance costs. Application code can be 
improved without a complete sy*te:n redesign. . In 
order to accomplish this, each application niust 
be analyzed to determine what modules account for 
the greatest portion of maintenance costs. These 
modules should be targeted for major ircJif ication 
or replacement. Care should be taken 'n avoic' ng 
the logic that machine language code :j costly to 
maintain. If such code is basically static, it 
costs little or nothing to mdinUin. The 
governing rule should be rework those modules/ 
programs that require frequent and time consuming 
maintenance. And during each effort, exe* ise 
care not to produce more code tfhich wi 1 1 
difficult to maintain. 

The point is not to suggest how to improve 
your particular software. The point is co 
suggest that it can be— that it must be in road 
through an evolutionary process. The f it u step 
is most difficult; establishing a software 
improvement project, and committing the 
organization to specific software improvement 
goals. Management must insist that a 1 1 system 
changes be, accomplished within the context «of 
this plan. Long-term objectives accompanied ^by 
discipline, perseverance, and accountability wi 1" 
result in more maintainable software, mo*f 
efficient code," less dependence on a particular 
individual's knowledge, and more rapid response 
to user needs. 

4.9 Utilize a Highly Disciplined Systems 
Design Methodology 

\ ■ 

The problems of poorly documented, aged, and 
patched programs that require more effort to 
maintain and modify than; was required to 
initially develop, are not\the result of a 
devious conspiracy by previous managers to 
undermine the data processing operation. To the 
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contrary, these problems have resulted because 
dedicated managers and technicians, in the name 
of expediency and user support, employed 
unorthodox techniques and shortcuts. In the 
development phase, they did more with less, so 
today we do less with more. 

In recognition of the problem, companies 
have developed and now market highly structured 
approaches to govern and guide development 
efforts.. Although these approaches differ in 
specifics and degree of automation, they are 
common iin the following aspects: 

a. A life cycle step-by-step approach to 
systems development is prescribed. 

b. User responsibility/participation in the 
requirement definitions, design alter- 
natives „ and functional specifications 
is mandated. 



processing managers thought little of using.\ 
computer capacity to automate payroll or 
inventory management- functions. They must be 
equal ly bold in automating their own efforts. 
The result will be a more satisfied and better 
served user, a more effective organization, and a 
more efficient data processing shop. Remember, 
there is more to data management and 
communication than tapping a telegraph key. 
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Form and content of the technical 
specifications are defined. 

User and data processing management 
review and concurrence at predefined 
intervals are specified. 
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e. Documentation standards are precise and 
unyielding . 

These approaches provide a simple but rigid 
skeleton which if built upon will support the 
muscles and organs of a complex system. Systems 
developed using these approaches are better 
documented old less complex than those designed 
usint} less structured; techniques. Consequently, 
these systems should be- less costly to maintain 
and modify. 

- New systems development projects that are 
the responsibility of the data processing 
organization should be accomplished within the 
confines of one of these structured approaches. 
The organization selecting a highly disciplined 
system design methodology, committing to the 
overhead associated with its use, and holding 
managers accountable for enforcing its 
provisions, will not have to spend its tomorrows 
trying to undo yesterday's expediencies. / 

5. The Costs of Change 

Adaptation and implementation of this 
approach to user support may require numerous 
policy, organizational, and staffing changes. 
Resources typically devoted to the programmer and 
analysts functions will hav.j to be ^diverted to 
comnfuni cat ions network support, database design 
and maintenance, marketing and training and one 
other topic , not yet discussed— capacity 
management and resource acquisition. The latter 
is mentioned because this approach will require 
substantial amounts of computer hardware and 
software. The approach requires the application 
of computer power to data processing tasks in 
order to make them more efficient. Data 
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STANDARD COSTING FOR ADP SERVICES 



David R. Vincent 
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This paper is an initial exploration into the area of standard costing .for 
ADP services. Historically , data processing expenses have been regarded as var- 
iable, depending on hardware and software procurement and usage. This paper^takes 
the position that information systems expenses are relatively fixed and that in^ 
formation is the resource to be managed, as opposed to hardware and/or software. 

Key words: Accounting methodologies; ADP services; data processing; data transfer;, 
information resource characteristics; management of the database; standard costing; 
storage of the information asset. 



The information systems community has pro- 
liferated numerous "accounting 11 methodologies, 
techniques and even packaged software products 
that have been designed to calculate and report 
the "true" ADP/EDP/IS unit, standard , . and/or 
chargeback basis for the costs of operations. 
The sad fact is that none of *hese methodologies, 
techniques, or methodologies reflects the real 
nature of information systems costs the way that 
they exist in today 1 s ' environment , and, more 
Importantly, the way in which we will view in- 
formation systems costs when we begin to con- 
sider that it is information that is the resource 
as opposed to just the hardware and software 
used to make that information available to the 
user. 

Moreover, I/S expenses tend to be rela- 
tively fixed when we consider them from a month- 
to-month basis. This is the way in which top 
management tends to view the I/S investment, 
and it is one that they wish to minimize in 
order to live within the' fiscal limitations im- 
posed v y the annual funding cycle. The irony 
comos with cost reports that show unit costs 
and chargeback amounts in government environ- 
ments where they must "charge users with the 
cost of their usage." With these kinds of 
reports and "MIS -information" comes the lower 
organization, concept that information systems 
costs are variable . Armed with this often 
dangerous information, users will tend to mini- 
mize their use of the "EDP resource," many/ times 
by taking their work to outside service bureaus, 
which causes -the I/S costs to the rest of the 
' users (and to the organization as a whole) to 
go upl Certainly, the people who drafted 0MB 



Circular A-121 didn't have this in mind when it 
was formulated . 

Traditional standard costing principles were 
developed in the manufacturing environment to 
account for variable material, labor, and over- 
head on a unit of manufacture basis. In the 
factory situation, material is not used unless 
there are units to produce, and labor is not 
called in or kept on the job unless there are 
units to produce. Most of the overhead goes on, 
'so there may be an overhead absorption problem, 
but variable costs may be inventoried or. avoided 
simply by not purchasing. Similarly, tradition- 
al governmental accounting has alway.s jeen aimed 
at a somewhat variable personnel and materials 
expense budget. ^ 

In tbe information systems environnent, the 
truly variable costs are few, such as paper, 
reels of tape, electricity, and any temporary 
portion of labor. All the rest of the . costs are 
relatively fixed such as hardware, softwa re, de- 
velopment, staff, facilities^ and management . 
Even with all the good intentions of GSA, "an 
increasing fixed investment in software will not 
allow the hardware expenditures to become vari- 
able. 



The only real variable involved is the^work 
that can be processed by thase relatively /ixed 
costs, which :s analogous to distribution of man- 
ufacturing ovei lads. A more appropriate method 
for understanding and analyzing information sys- 
tems costs is, therefore, greatly needed. We . 
also need to redefine what we are trying to 
analyze as the information systems environment 
changes from focusing on a centralized, fixed- 
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cost/overhead, support function to analyzing the 
investment, cost of creating, and cost of maintain- 
ing the organlzal lonal , resource information . 

The first step toward resolving this issue 
lies in understanding that information systems 
is only a small component "of a much larger in- 
formation resource. The primary sorts of infor- 
mation resource activity that will fall under the 
aegis of what is currently called the Director 
of ADP (and is likely to be called the Chief 
Information Officer in the future) may be de- 
scribed by the following four major areas: 

• Storage of the information asset 

• Management of the database 

• Data processing 

• Data transfer 



1. Storage of the Information Asset 

With the growth of the organizational in- 
formation asset, the storage of that resource 
takes on new significance. It is the author's 
opinion that by the end of this decade, the 
accounting community will insist on reporting 
the information*, asset as a fixed asset of the 
organization just like any other asset (e.g., , 
cash, equipment, buildings, inventory) and that 
this asset will represent the single largest 
asset for most governmental organizations. This, 
coupled with the advent of fixed disks on direct 
access storage devices that consume more and 
mere physical space, causes this asset to re- 
present a larger and larger investment to the 
organization. Other types of information storage 
exist such as tape and mass storage devices, but 
with the increasing use of information on a real- 
time basis, their appropriateness will diminish 
(one possible exception is an emerging technology 
involving the use of optical discs that may be , 
removable, storable, and shippable, and are 
rumored to cost less than tape) . The appropriate- 
ness of data to be stored at all, as well as the 
decision as tp which* method of storage is appro- 
priate, will be resolved only by analyzing the 
costs of storing information as well as the costs 
of database management for the organization re- 
quiring information storage. 

THIS PAPER ASSUMES' : THAT TOP MANAGEMENT AND THE 
USERS OF INFORMATION ARE THE ONLY GROUP THAT CAN 
ASSIGN VALUE TO THE INFORMATION RESOURCE, IT IS 
UP TO INFORMATION SYSTEMS MANAGEMENT- TO PROVIDE 
THE USERS AND TOP MANAGEMENT WITH SUFFICIENT 
. COST INFORMATION TO ENSURE THAT THEY WILL BE ABLE 
TO MAKE INFORMED COST AND INVESTMENT DECISIONS , 
ESPECIALLY THOSE REGARDING TRADITIONAL EDP RE- 
SOURCES SUCH AS CPUSAND THE LIKE (this was the 
real intention of OMB Circular A-21) . 
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2. Management of the Database 

Database management includes all the hard- 
ware and software necessary to make that infor- 
mation available for retrieval. The security of 
the data and data integrity are ensured by the 
adequate management of the information. This 
will be the most difficult area to manage in the 
future. Visionaries in this area such as James 
Martin, Bill Synnott, and Bill Inmon, for example, 
spend most of their time trying to develop long- 
range thinking regarding these demands. For 
those of you concerned with sub-second resjjonse 
time, this area will provide the greatest chal- 
lenge. What is seen today with systems such as 
IMS, IDMS, etc. can only provide an inkling of 
what is to come. We are already hearing much 
about relational and distributed vs. centralized 
databases. 

3. Data Processing 

Users have tradftionally used the central 
processing facility for their processing needs. 
With the advent of minicomputers, some began to 
establish their own processing resource, while 
continuing to utilize the central processor as 
well. With the increasing proliferation of per- 
sonal computers , . it will become even more popular 
to process one's own information locally, es- 
pecially with the heightened requirement of 
response time. Therefore, it will be necessary 
to assist the users in the tradeoff decision as 
to whether they will use centralized or distri- 
buted processing resources by providing them with 
central processing cost data. This can be com- 
pared with their local processing costs as well 
as the advantages of sub-second response time. 
However, there may still be costs for information 
storage and management from a central facility. 
These costs should be analyzed from the trade- 
offs concerning local versus remote storage 
(and what the corporate policy is regarding this) ' 
as well as the costs to ship the information 
where it will be used, whether that be over a 
telecommunications network, by express or reg- 
ular mail, or whatever. This cost category is 
covered in the next segment regarding the trans- 
fer of data. 

I 

A . Data Transfer 

In the case of a centralized data base, or 
even in the case of distributed data bases, there 
is the need for data transfer when there are 
distributed terminals or processors. This cost 
is essentially a communications cost and will 
become extremely important as file transfer be- 
comes more popular due to the use of personal 
computers to do local processing. The -author 
feels that this will be the major information 
cost of the future (even though the advent of 
optical discs may provide some temporary relief). 

The characteristic behavior of the above 
areas of information is unique. The methods of 
measuring/ costing, and analyzing each area will 
require separate considerations such as local 
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versus remote processing as well as information 
storage, data management and relevant communica- 
tions {data transfer) trade-offs. Table 1 sets 
forth the salient character ist ic s for each area 
to be used as a guideline. A metric for each is 
stated, but this does not mean that the measure- 
ments from the four different areas of activity 
may be added to express an overall measure of 
capacity as has been expressed by such concepts 
as software physics, computer resource units, 
MVS service units, and the like. Adding the 
four areas together would obfuscate the unique 
characteristics of each. Even though clever 
people can think up some kinds of algorithms to 
explain system behavior with combined statistics, 
top management and the users of information will 
:v n be able to understand the algorithms or be 

le to make easily the important trade-off de- 
cisions re the I/S investment. This kind of 
reporting would be analagous to the utilization 
rt?ports put out by many i/S departments showing 
such things as CPU utilization in percentages. 
What kinds of decisions in today f s environment 
can be made on that kind of information? 

The costs for storage, of information will 
be associated with the type of media used and 
the space occupied. The alternate metric of 
bytes may be the way that space is defined as 
opposed to tracks, tapes, etc. For the purposes 
of this discussion, it really doesn't matter. 
What does matter is how to relate the investment 
made in storage to how it is used to satisfy 
users' information needs. 

Database management costs should include 
all the I/S resources needed to get information 
into and out of storage. The kinds of resources, 
used will include CPUs, disk and tape devices, 
controllers, and various software and operating 
subsystems. The important thing is that the 
database management costs will be associated with 
the time that an I/O system is used, the amount 
of time that a device is used, or — with the 
advent of cache memory devices — the time that 
• cache U used. This would also apply to any 
other media such as mass storage or tape devices. 
Thes/costs. (not to be confused with storage 
coslfs) are for the retrieval of information as 
opposed to the cost of storing the data.: 

The data processing cost, as mentioned 
before, is direttly associated with the central 
processor and its operating systems, personnel, 
and other overheads. It may also be associated * 
with the distributed processors, which are very 
easy to cost since they are fully located in 
user departments. This is especially true in 
governmental organizations, whose procurement 
policy makes it much easier to purchase micro - 
or minicomputers wittiin an organization. 

The transfer costs are comparable to tele- 
phone costs and must represent the amount of time, 
that a communication link is used. This link 
should also include the distance characteristic, 
because communications, costs go up proportionately 
with distance. 



In summary, the use of this kind pf revised 
analytical approach to management and costing- will 
enhance the effort by making it understandable to 
both users and management. It also makes it pos- 
sible to relate the costs to the burgeoning 
organizational resource known as information and 
the trade-offs that exist in the procurement 
of ADP resources. 

Table 1. Information resource characteristics 



Metric 


Storage 


Management 


Processing 


Transfer 


Basic 


space 


time 


time 


, time/ ' 
distance 


Alt- 
ernate 


bytes 


bytes 


bytes 


, .bytes/ 
distance 


Other 


media 


system 


peak/ 
time 


* peak/ 
time 
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AUTOMATING CONFIGURATION MANAGEMENT 



Enrique G. DeJesus, lLt USAF- 
Craig.J. Riesberg, lLt USAF 



HQ MAC/ADCI 
Scott AFB, 111. 62225 



The ability to manage Software and Hardware Changes at remote sites is one 
,6f tol££t£l. components in any computer 
'/configuration management are addressed: ' Change Control^ Validation Testing, 
/ Inventory Management and Software Distribution. 

ThP first element Change Control, provides a complete audit trail 

Umited to automatic numbering, cataloging incomplete requests and journalizing 
historical data upon completion. 

The second element, Validation Testing, provides for internal software 
driven flexible benchmark type testing; or an external .remote terminal emulator 
which £Sn uses the flexible benchmark concept but adds stress testing 
capability. 

The third element, Inventory Management, deals with ■ the particular 
' applicability o'f the change to the "site taking into consideration the hardware 
configuration. 

The fourth element, Software Distribution, handles automatic shipment of 
bundleS SoS2^ to-iti. configured on line Further, it provid es .utj-tlc 
preparation for shipment via the most expeditious means available to sites not 
accessable through the computer network. 



Key-words : Change control ; 

distribution; validation testing. 



inventory 



management; 



software 



1 . . Introduction 



Most federal agencies and civilian 
organizations rely heavily on their computer 
networks to accomplish daily operations. With 
the current impetus toward distributed 
processing, changing bundled software at remote 
sites becomes an ever increasing problem. Not 
only is it necessary to transfer new software to 
the geographically separated locations, it is of 
utmost importance that the software perform as 
expected. In today's rapid-paced data 

processing environment, operational systems rely 
on timely and adequately tested software. This 



paper discusses a system which: 1. allows 
users to request changes and enhancements 
and' at the same time provides the host 
organization the means to control those 
requests (change control), 2. tests the 
newly-created software prior to field 
implementation (validation testing) , 
3. allows the host organization to control 
the software and hardware inventory in the 
field (configuration management) and 
H. • provides various means for. the 
distribution of bundled software (software 
distribution). The system is modularized 
so that any organization can implement any 
module independent of any other. 
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2. System Overview 



The entire process of changing software at 
remote sites can generally be broken down into 
eleven distinct and separate steps (see 
figure 1). In order to provide reliable 
software these steps must be followed in precise 
order. While several of the operations are 
outside the scope of the automated distribution 
system the operations are enumerated to give 
the total view. Each step is listed along with 
its place in our system. 

1. Identification of Requirements 
Performed by the user or system builder who 
identifies a new requirement or the need 
for an enhancement (outside of system). 

. 2. Control of Requested. Changes 
Performed by the software control agency, 
possibly with their order of importance 
(change control). 

3. Evaluation of Request - Must be an 
independent organization or individual to 
remain objective as to cost versus expected 
gain (outside of system). 

4. Analyzing and Coding New Software - 
Performed by host organization personnel or 
contracted to a software agency (outside 
of system) .' 

5. Development Testing - Conducted by a 
group supporting the writers. May only 
test the new software in a stand-alone mode 
without testing its interaction with other 
systems (outside of system). 

6. Documentation of New Software 
Performed by writers (outside of system). 

7. Verification of Documentation 
Performed by the software control agency. 
Determines if new software changes are 
supported in the user's/operator's manual 
(outside of system). 

8. Software Bundling - Performed by the 
software control agency. Will use source 
programs to provide bundled software for 
various types of machines (inventory 
management). 

9. Release Testing - Performed by the 
software control 'agency. Will test the 
interaction of new software with other 
5:7s terns by a benchmark or through the use 
of a remote terminal emulator (validation 
testing) . 

10. Distribution - Performed' by the 
software control agency. Forwards the 
bundled software to the sites via available 
and reasonable means ( software 
distribution). 



11. Historical Documentation - Performed 
by system maintainer. Shows new changes 
and modifications to the existing system. 
A continuation of item number 2 (change 
control). 



3. Change Control 



Software changes can be triggered by new 
user's requirements or. _ problems encountered" 
during normal logic execution. In either case, 
it \ is » one of the forces that can affect the 
operational integrity of any computer system. 
These changes are usually Organized in documents 
whijch serve as the official voice of the new 
requirements or modifications needed. These 
enhancements must be approved by the software 
control agency before they can be passed' to the 
appropriate working agencies. 
\ 

\lt is extremely critical that the software 
control agency maintain an audit trail of the 
software enhancements being worked -or already 
worked within .the system or systems under its 
control* By the tracking of these 1 documents, 
the software developer is kept informed of 
enhancements or changes " that need to be 
accomplished in- order to meet ,the user's 
corporate requirements. On the other hand, the 
document originator receives two' levels of 
acknowledgement. First, the software ■« control 
agency ^pproves the request and passes it to the. 
software^ developer. Second, after the 
requirement ,or fix has been satisfied by the 
developer, the result is a software release to 
the controlling agency with the appropriate 
documentation. This, release is then passed back 
to the document originator. The change control 
module off our system will allow for: 1 . the 
automatic! numbering' of documents, 

2. transmission and acknowledgement, and 

3. historical data management (see figure 2). 

\ 

Once I a software enhancement has been 
identified \ the user will have the capability to 
perform an I online so:." '.ware change request. The 
automatic^ numbering module will accept the 
change reo'iest assigning a unique number. By 
this number! the system will be able to determine 
the docurtsnt originator, system or subsystem 
affected, software developer, priority and the 
agency whicm will receive the new release. As 
soon as thd document is accepted it will be 
passed to* tjie transmission and acknowledgement 
module. ' This ' module is responsible for the 
electronic 'transfer of all active charige 
requests to the respective software developer' s 
working agencies. In addition, upon arrival of 
the software release package from the software 
developer, tnis module will update the change 
request database in accordance with the 
documentation^ ^provided in. that package. The 
historical data management module will provide 
housekeeping reports of change requests overdue 
or outstanding for ,*»\y period of time. 
Furthermore, it can supply the administration 
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wit.h total or partial cost of software 
enhancement or system development, 

h. Validation Testing 

"he Validation Testing module of our 

distribution system is -designed to exercise and 

verify *f newly created software against the 
system : ystems affected. 

Exhaustive testing of the software to be 
released is the key item for a successful 
implementation on the computer network. Our 
proposed system will: 1 . build and maintain 
test scenario files for i ] l systems specified, 
2. perform external or in^e-nal system tests, 
report to the tes . director using 
online/report (hard copy) r .es or the test 
findings and H. evaluate the per forma nee of the 
software during the preceding test (see 
figure 3). 

In der to effectively test a software 
release, we need to organize a baseline of all 
the transactions allowed in the system or 
systems in question. This baseline must 

identify the transactions in three major areas: 
software function performed, input data and 
expected output. The first function on the 
validat ion testing module allows the creation of 
a transaction driven test scenario with the 
capabilities of performing online additions, 
modifications or deletions of transactions 
already established. The accuracy of the 
contents of such scenarios will be the critical 
factor for a valid test evaluation and a 
successful implementation of the released 
software on the network. 

Another element that needs to be + aken into 
consideration when building the scenario file(s) 
is the system test database(s). Cost versus 
reliability might be a factor when determining 
the size and contents of the test database. 

Once the scenarios are built, they are 
executed in the second function of the 
validation testing module. The execution of the 
test scenarios might be external or internal to 
the systemf s) under test depending upon the host 
organization resources. The external driven 
♦:est will use a Remote Terminal Emulator (RTE) 
as the test driver. The RTE will be loaded on 
hardware outside the system(s) under test 
environment, eliminating the competition for 
system resources once the test has been started. 
Some advantages in using the external test mode 
are the network operational simulation of inputs 
through the physical communication lines allowed 
in the system(s) under test and the ability to 
perform stress test not only on the system under 
test but on the network itself. A disadvantage 
might be the cost of the additional hardware 
used by the RTE to drive the system(s) under 
test. 



The internal driven test uses a software 
test driver validate the release. The 

software driver shares the system(s) under test 
resources and passes the transactions speci f ied 
on the test scenario file( s) in a single 
threaded mode to the appropriate application's 
software. 

Both testing methods mentioned above, 
external and internal, will validate each and 
every output received from the system(s) under 
test against the baseline output specified by 
the test controlling agency in the system test 
scenario file(s). Any mismatches are reported 
to the online test director's device established 
at the beginning of the test. This is the 
primary role of the third function of the 
validation testing module. The test director 
will have to make a decision concerning the 
test. Possible alternatives are: 1. continue 
the test, 2. retry the transaction that caused 
the mismatch, and 3. restart or stbp the entire 
system test. Particular attention should be 
given to the system under test's database(s) 
when deciding further actions after a 
transaction mismatch is detected. 

The successful execution of the system(s) 
test scenario will provide high confidence in 
the operational reliability of the released 
software once loaded on the network. 
Furthermore, after the test of the operational 
capability, our validation testing module will 
spawn a Computer Performance Evaluation (CPE) 
package as the last validation function. During 
the execution of the system(s) scenario test 
file, CPE data can be collected and stored in 
different medias (e.g. tape or disk) so that 
elapsed times for different functions 
(transactions) can be analyzed and compared with 
previous CPE results. Service times (the time it 
takes application software to work a requested 
function) as well as Response times (total 
elapsed time from user's terminal to 
application's software and return) can be 
analyzed so that unexpected results can be 
evaluated prior to release. 

Other levels of validation might be 
performed in addition to the four functions 
described. One such validation might be 
performed on the documentation provided by the 
software developer in the software release 
package. For example: the system operator's 
manual can be inspected to insure that new error 
messages or system commands are included in the 
package. 

5. Inventory Management 

One of the largest problems in business 
management is knowing what goes where and who 
gets what. Management of computer resources is 
no exception. The Inventory Management section 
of our system is designed to provide a complete 
inventory, both hardware and software, of each 
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individual remote site. By utilizing the 
information gathered and stored by the inventory 
management module, the system is able to 
determine compiling and bundling requirements 
for various types of machines. This information 
is also shared with the ^ distribution module 
during the shipping phase of a software release. 
Since it is information which is shared, the 
inventory management module allows for the 
storage of information to be used by other 
modules (e.g. site addresses for the 
distribution module which will be used to 
prepare mailing labels). The inventory is 
broken inU two distinct divisions; hardware and 
software (see figure U) . 

Effeccr.9 hardware inventory management can 
provide help in at least three different areas. 
These three areas; system documentation, cost 
analysis and capacity or upgrade planning are 
intricately interwoven in the inventory manage- 
ment module. Each area can be developed as 
needed beginning with system documentation. 
System documentation provides the basic 
information needed about any remote site. It 
provides a list of what is currently available 
at the site in regards to CPU, memory, work 
stations and peripherals connected. By 
combining system documentation along with 
purchase, lease and maintenance contract costs 
the inventory management module can provide cost 
analysis data for any given site in any given 
configuration. After including more information 
concerning the given systems maximum capacities 
regarding memory and peripherals, it can provide 
a type of capacity planning as well as cost 
factors to support an upgrade to a higher 
capacity. 

Software inventory management, while 
providing report generation capabilities, is 
used primarily by other modules of the 
distribution system. The validation testing 
module uses the software inventory to determine 
which configuration:) need to be tested when new 
software is released. By maintaining a list of 
bundled software for each site, only those 
configurations which will ultimately receive the 
new software will be tested. It is also through 
this module that the releasing organization can 
accurately estimate the amount of time necessary 
to bundle, test and distribute new software. 

The software distribution module utilizes 
the software inventory in conjunction with the 
hardware inventory to determine which sites will 
receive the new software releases after it is 
successfully tested. As stated earlier, It not 
only determines which software is to be released 
but which mode of transmission will be used to 
transport the software. The inventory 

management module maintains site addresses for 
postal and electronic mail. 

By examining hardware and software 
inventory at sites containing more than one 
machine of the same make, it may be possible to 
reconfigure one of those machines so that two 



different systems could be run on it in a 
degraded mode. This hybrid backup system would 
be extremely useful in keeping an essential 
system operational during a period of machine 
failure. It could also be used to provide 
service during the relocation of a system. 



6. Software Distribution 



The software distribution module of our 
system is responsible for the automatic shipment 
of all new software after it has been bundled 
and tested. It insures that all remote sites in 
the network receive the version of the new 
software that is compatible with their hardware. 
The actual distribution of the software lends 
itself well to automation because it is 
generally repetitive work. Unique features of 
the distribution module include the following 
capabilities: 1. distribution to an operational 
test site prior to general distribution, 2. use 
of different modes of distribution based on the 
urgency and size of the released software, 
3. inclusion of instructions on how to load and 
operate the new software and H . automatic 
acknowledgement (see figure 5). 

The first feature, the ability to 
distribute to an operational test site, is 
almost an extension of the validation testing 
module. In the validation testing module the 
new software has been rigorously validated 
against a test scenario. However, certain 
aspects of the new software, such as overflowing 
storage tables and indexes can only be exercised 
in an operational mode. Distributing the new 
software to an operational test site provides 
this additional mode of testing prior to its 
general release. If different sites execute 
different portions of the software, multiple 
sites can be chosen and controlled by the 
distribution module. After a specified period 
of time, such as 30 days, if there are no 
problems with the software it will be 
automatically released to all applicable sites 
in the network. An additional benefit of the 
operational test site is that it allows for the 
validation of loading instructions and 
user's/operators manual by field personnel 
prior to general release. 

The second feature is the capability to use 
different modes of distribution based on the 
urgency or size of the release. Determining the 
mode of distribution will generally require an 
analysis of need versus cost. Generally, the 
fastest method of distribution will be the most 
costly. Available means of distribution include 
but are not limited to; physical distribution of 
tapes or disks and distribution via electronic 
mail or direct communication links. 

Most releases will be of a routine nature 
and can be accommodated by a relatively simple 
distribution system. In its most elementary 
state the system will copy the new software to a 
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magnetic tape or disk, providing one copy ^ 
each site that in to be a recipient one new 
software. It will then print d mailing label 
and a set of loading instructions for each site. 
The main advantage of this method is that it can 
handle the largest releases, including 
completely new operating systems. Additional 
advantages are that it minimizes shipping costs 
and the updated user's/operator's manual can 
accompany the software release package. The 
main disadvantage is that it xS slow in moving 
software to the remote sites. Part of this can 
be attributed to the need to generate one copy 
for each site, and partly to the available means 
of transportation to remote sites, especially 
outside the continental United States. The copy 
and forward method can also be used to transfer 
hybrid backup systems (see inventory management) 
to be used during system failure. 

The second type of distribution is for time 
critical emergency releases. This type of 
distribution utilizes the existing communication 
links of the system or some special link to 
quickly transfer the new software to a 
particular site. Other factors to be considered 
when selecting this method include the size of 
the release, the number of sites which must 
receive it and the availability of the 
communciation link for an extended period of 
time. The main advantage of this method is its 
speed. Emergency patches to software can be 
quickly transfered along with the loading 
instructions. Another advantage is the use of 
the networks own communication system to 
transmit a single copy to all remote sites 
concurrently. The main disadvantage is the high 
cost to transfer over long distances to numerous 
sites. Other disadvantages are that the network 
itself must be operational and that a large 
release could tie up the communications link for 
an extended period of time. 

The next feature is the software 
distribution module's capability to incorporate 
instructions into the release package. While 
the receiving site may have the knowledge to 
load its own software, this seemingly 
insignificant capability allows the sender to 
furnish specific instructions for each site. 
The loading instructions can have a significant 
impact on the software's usefulness. For 
example, new software might be time sensative - 
simultaneous loading of new communication 
software may be required at all sites - loading 
at any other time may cause edit errors and 
incorrect updates. 

The final feature to be discussed is 
automatic acknowledgement. Two types of return 
acknowledgement should be built into any system. 
The first is acknowledge-ment of receipt of the 
new software. If the distribution was through 
the network's communication links this can be 
automatic; otherwise, the receiving site must 
manually trigger a return message stating when 
the new software was received. It is important 
for the releasing agency to have a historical 



record of when and how each release was received 
by the remote sites. The second acknowledgement 
is generated by the remote site. It is an 
informational message telling the distributing 
office that the remote site has complied with 
the instructions and is or is not currently 
using the new software. If the remote site 
encountered any problems in loading or operating 
the new release they may also include narrative 
description of the problem in this message. 
This return information is stored by the 
software distribution module for a specified 
period of time. 

7. Concluding Remarks 



Any large organization deeply involved in 
distributed " processing would benefit from an 
automated configuration management system. 
After determining the need, but realizing that 
the resources are limited, the next question is 
logically "Where should we start?". Because 
operational sites are so dependent upon the host 
for valid software and because the organization 
itself may not survive without online data 
processing, the validation testing section is 
our choice for initial development. It assures 
the distribution system that the software being 
released is worth the effort; however, it is 
hard to distribute the software without knowing 
who needs it. For this reason we chose the 
inventory management module as the second step. 
The inventory management information must be in 
a format readily available for software 
distribution; therefore, it is natural for 
these two units to be developed concurrently, 
leaving change ' control as our la3t effort. 
Regardless of order of development, each module 
should be as independent as possible to allow 
for future enhancement and redesign. 
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Abstract 



The Terminal Probe method nas recently been used to compare 
selected performance indices of different interactive computer 
installations. Since these comparisons ha>>e been done under the 
H natural M work load of the systems several validity questions 
arise. 

In this paper we present results which summarize oui' 
experience in 10 systems with more than 11.000 measurements made 
over a span of two years. We analyze the empirical oehavior of 
response time and find system independent statistical properties 
which enable us to attain predetermined confidence intervals for 
our values. Moreover, we have found a family of statistical 
models which fit our data in a comprehensive way: not only the 
mean and variance of our response t ime d is t r ibut ions are well 
approximated but the modelled distributions fit the observed 
ones. Thus, ordered statistics can also be obtained. 

We may now control the data gathering period better, and, 
what is more, we have a predetermined statistical confidence in 
our curves. Thus, the robustness of the method is justified for 
compar isons . 

Key words: Benchmarking; generalized linear models, 
installation comparisons; linear predictor; performance; 
performance indices; terminal probe; UNIX operating system; 
work load estimators. 



1.0 INTRODUCTION 



This research was supported in part by 
the National Science Foundation grant 
MCS-8012900 and by a Pontificia Univer- 
sidad Catolica de Chile research grant 
DIUC 47/82. 



Over the years the so called 
Terminal Probe Method has been utilized 
in a variety of contexts. Indeed, not 
only it has been used to assess a 
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specific configuration under a 
controlled work load, but there have 
also been studies which use it as a tool 
for comparing different installations 
operating under their "natural* work 
loads. Thus, questions regarding 

representati v i ty and validity of these 
comparisons have been posed. These 
questions have remained unanswered. 

In this paper we will present a 
study which shows that there exist 
system independent statistical 
properties of performance data, gathered 
using the Terminal Probe method, which 
enable us to draw firm conclusions from 
studies based on approximate work load 
estimators. Moreover, the study also 
shows how one can limit the data 
gathering period and how one may select 
some predetermined values of the work 
load estimators as sole objects of 
measurement, thus reducing the method's 
overhead even more. 

Our results are based on 11,813 
measuremnts taken on 10 systems over a 
span of two years. All of the systems 
were presented with the same portable 
benchmark which runs on any UNIX system. 
Even though our benchmark gave us three 
"time" measurements; user, system and 
response time, we have limited our 
discussion to response time. 

Section 2 briefly explains the data 
gathering method and the systems 
studied. Section 3 gives a study of the 
correlations observed between our work 
load estimators* Section 4 summarizes 
our study of the response time 
distributions, as functions of our work 
load estimators. Section 5 introduces 
our main statistical tool, the 
generalized linear models, and presents 
the approximations obtained with GLIM. 
Finally, we present our conclusions in 
Section 6. 



2.0 THE DATA GATHERING METHOD AND THE 
SYSTEMS MEASURED 



All of the installations used for 
our measurements run versions of UNIX 
operating systems. UNIX is a trademark 
for a family of operating systems 
developed at Bell Laboratories over the 
last 14 years [11,14]. In 1969 a first 
version was implemented on a PDP-7, and 
since then they were ported to PDP-lls, 
VAXes, IBMs, Amdahls, Interdatas, NCRs 
and others. In 1979, a group at the 
University of California at Berkeley 
implemented a paged virtual memory 
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extension to UNIX for the VaX [1], which 
is now part of what is known as 
"Berkeley UNIX". Most of our 

measurements wore done in this last type 
of system. 

On logging in in a UNIX system each 
user is assigned a special process 
containing a command interpreter, known 
as the "shell" [3], which listens to the 
terminal. The shell parses the input 
line and decodes the command requested 
along with its flags and arguments. 
Then it execs the command. Shells may 
also read commands from files. Thus 
users may define sequences of shell 
commands, known as shell scripts, and 
store them in files for later 
invocation . The versatil i ty of these 
scripts is greatly enhanceo by the fact 
that the shell language also contains 
control- flow pr imi tives, st r ing-valued 
variables and arithmetic facilities. 
Since UNIX automatically handles all 
file allocation decisions, the 
portability of these scripts is greatly 
facilitated. 

The strategy for monitoring 
responsiveness of each system was the 
same as that used in (4], It consists 
of running a script which has a set of 
predefined benchmarks together with 
commands which gather statistics about 
the work load and measure the time it 
takes the benchmark to terminate. The 
script runs periodically in a totally 
automatic way. Each time the script 
cycles through its commands, it executes 
a "sleep" command that suspends its 
execution and then wakes it up after a 
predetermined number of seconds. We 
decided to use the time command because 
of our commitment to use standard UNIX 
tools. Time has a rather low resolution 
and truncates, it does not round off* 
More details can be found in (5,6]. 

This data gathering method can be 
categorized as a time-sampling method 
(9] and is in fact very similar to 
Karush's terminal probe method (10]. By 
using it wi th a system in normal 
operation one evaluates the performance 
of an installation. 

Table 1 presents a summary of the 
number of measurements for the various 
systems. Hardware changes were made to 
an installation in some cases. For 
statistical reasons we had to consider 
the systems which existed before and 
after the change as being different. 
(For example, in Table 1 VAA and VAB 
correspond to two such systems. This 
change, a fairly trivial one, was the 
addition of some ports to the 
installation. ) The most important 
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System 


No. of Hp a u r p m p n t q 


VI 


» J4t / 


P7A 


: 3206 


VM 


1 HO f 


P7B 


1 ill Q 


VAA 


844 


VC 


791 


VB 


435 


P4A 


: 301 


P4B 


: 235 


VAB 


91 


Total : 11,813 



Table 1: Systems measured and 
the number of observations. 



mnemonic element in the names is given 
by the first letter. Those names which 
begin with a P are PDP-11 systems. 
Those with a V are VAX systems. All 
VAXes are 11/780, [71 . ft a 7 appears 
it means 11/70 and a 4 means 11/40. 

There were a total of four 
different installations, three at the 
University of California, Berkeley, and 
one at Purdue University. Those 
measurements labeled VC come from 
Purdue, where the VAX 11/780 had, at 
that time, a configuration with 3N bytes 
of main memory, 56 ports, three RM03 
disk drives on Massbus 0 and one TE 16 
tape drive on Massbus 1. Measurements 
from P4A and P4B came from a PDP 11/40 
which had 200K bytes of main memory, one 
DIVA disk controller and three DIVA disk 
drives with 50M bytes disks. This 
installation had 23 ports and no 
floating point arithmetic unit. The P4B 
measurements were made on the 
installation without a cache memory, and 
the P4A with a 2K cache memory. The P7B 
measurements were made on a PDP 11/70 
with 1.3M bytes of main memory, 2K cache 
memory, one DIVA disk controller with 
four DIVA disk drives and an RS04 fixed 
head disk used as swapping device. This 
installation had 81 ports. The P7A 
measurements were made on the same 
installation with 2M bytes of main 
memory and where the drum was used for 
storing temporary files instead of as a 
swapping device. 

All of the other measurements were 
made at the same installation which went 
through sucessive changes. VB was an 
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11/780 with 512K of ma in memory , two 
RP06 disk drives, one TE 16 tape drive 
and 16 ports. This was a "swapping" 
UNIX sys tern. VAB was the same 
installation but running a paging 
version of UNIX. VAA was VAB with 8 
more ports installed. VM had 2M bytes 
of main memory, 2 RP06 disk drives, two 
CDC 300M byte disk drives and 32 ports. 
VI had 4M bytes of main memory, 4 RP06 
disk drives, two CDC 300M byte disk 
drives and 72 ports. 



3.0 CORRELATIONS AMONG WORK LOAD 
ESTIMATORS 



Throughout our data gather ing 
period we ran a script which contained 
three basic tasks : aC compilation, a 
CPU bound job and a text formating job. 
We also measured three work load 
estimators: the number of users logged 
in (nu) t the number of processes in the 
process table (np), and the number of 
active users (nau) . Nau was obtained by 
counting the number of ports which had 
more than one process associated with 
them. This generated a bias in systems 
where there were several deamons 
running, but no correction was made 
because the same number existed during 
the entire data gathering period. We 
also measured the global response time 
of this mix, which we called script. 

At a latter stage, and only for the 
systems VI and P7A, we appended two new 
tasks to exercise aspects of the systems 
which were found not to be well 
ind i vidua 1 ized . These tasks did not 
alter the measurements taken of the 
initial mix. They were a copy of a 60K 
byte file within the same disk, and an 
editing session involving a series of 
commands made to a 60K byte file. 

The choice of these portable work 
load estimators has been documented 
[4,5,6]. Nevertheless, their 
correlations were now studied. As they 
are related to each other, it was of 
interest to decide if we could explain 
the same phenomena with just a subset of 
them. Table 2 presents the results 
obtained for the correlation 
coefficients corresponding to all the 
measurements in each system. 

From Table 2 we observe that there 
exists substantial correlation between 
the different pairs of estimators. 
These coefficients are certainly larger 
than those observed in the social 
sciences, which are on the order of .4, 

199 

202 



System 



Correlation Coefficients 





nu vs nau : 


nau vs np : 


nu vs 


VI : 


.899 


.799 


.689 


P7A : 


.917 s 


.849 : 


.791 


VM 


.880 : 


.854 : 


.783 


P7B 


.946 : 


.903 


.884 


VAA 


: .852 


.838 


: .746 


VC 


: .881 


: .769 


: .686 


VB 


: .842 


: .855 


: .771 


P4A 


: .677 


: .828 


: .660 


P4B 


: .410 


: .345 


: .470 


VAB 


: .492 


: .906 


: .606 



Table 2: Correlation coefficients between work 
load estimators for the different systems. 



and smaller than those obtained in 
economics, which are on the order of 
.95. Moreover, we also see that the 
pair which tends to be less correlated 
is nu vs np, which suggests that these 
two estimators measure different aspects 
of the work load. We also observe that 
for P4A and VAB, nau and np are much 
more highly correlated than the other 
two pairs. In Section 5 we shall see 
that for most systems and tasks, nau 
adds no significant additional 
information to that given by nu and np. 



Using the SPSS facility called 
SCATTERGRAM [13], we were able to plot 
response time for our different tasks as 
an individual function of each of our 
work load estimators. Moreover , 
scattergram also gives us an idea as to 
how each cross section distribution 
looks. It is unfortunate that it prints 
values only through the digit 9, but 
further analysis enabled us to plot the 
exact histogram, and hence the observed 
shape of each distribution. We shall 
discuss this in Section 4. 
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Figure 1: System VI. Scattergram of the task man man. 
Nau (AUSER) versus rt (RESP) . 
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Figure 2: System VI. Scattergram of the task man man. 
Nu CNUSERS) versus rt (RESP) . 



In Figures 1, 2 and 3, we display 
the scattergram of one task measured in 
system VI, the text formating one called 
MAN MAN . The over all shape observed 



here was typical of each scattergram we 
made. nau (AUSERS) always tended to 
produce more compact distributions and 
np (PROCEC) had a larger range. 
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Figure 3: System VI. Scattergram of the task man man- 
Np (PROCEC) versus rt (RESP). 
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Two very important observations 
should be made from these figures. 
First, the variance of the response time 
distributions increases as the work load 
estimator increases. Second, the 
distributions show a large degree of 
:;kewness towards the larger values of 
the work load estimators. This means 
that large upper "tails" were observed. 
These two statistical characteristics of 
the behavior of response time applied 
consistently to all systems and tasks in 
script. 

Figure 4 has the scattergram of nu 
(NUSERS) versus nau (AUSERS) observed in 
system P7A . The most interesting 
statistical feature of it is the 
existence of two clouds. The principal 
one, (i.e., the one in diagonal), was 
typical of all of these scatteirg rams . 
It graphically shows why . the two 
estimators have such a high correlation 
coefficient. The secondary cloud, that 
parallel to the horizontal axis, is a 
small cluster of points which depicts 
the system with a large number of users 
logged in but very few doing work. It 
must be said that this did not occur 
often, because there are only 30 such 
measurements out of the 3206 we had for 
P7 A . Most systems did not exhibit such 
a distinguishable secondary cloud. 



3.1 Robustness Of Our Correlation 
Coefficients 



The correlation between work load 
estimators is a function of the working 
habits of the installation's user 
community. Table 3 presents the 
correlation coefficients obtained from 
random subsamples in all systems. The 
size of the subsample and its percentage 
of the whole sample are indicated. 

By comparing Table 3 with Table 2 
we see that there is indeed great 
stability in these numbers. Another 
remarkable fact is that all relative 
orderings between coefficients are 
preserved. In each system the ordering 
of the cor relations coincide. This 
clearly indicates a high degree of 
continuity in user habits. On the other 
hand, some hardware changes do bring 
different bahavior patterns as can be 
seen with P4B and P4A, VAB and VAA. 
From an overall analysis of the behavior 
of each installation which underwent 
hardware changes, we were able to detect 
changes in the user habits only when the 
hardware additions significantly altered 
the responsiveness of the system at all 
levels of the work load. 
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System 



No. of 
meas • 



% total 
meas . 



Correlation Coefficients 



nu 



vs nau: 


nau vs np: 


nu vs np 


.896 : 


.785 : 


.668 


.909- : 


.849 J 


.795 


.880 : 


.860 : 


.789 


.948 : 


.899 


.885 


• 842 


.821 


.740 


.887 


: .776 


: .684 


.836 


: .861 


; .769 


.705 


: .834 


: .698 


.384 


: .343 


: .489 


.339 


: .856 


: .481 



VI 

P7A 

VM 

P7B 

VAA 

VC 

VB 

P4A 

P4B 

VAB 



1708 
1471 
740 
515 
425 
398 
225 
156 
114 
39 



49.78 
49.71 
50.04 
50.06 
50.03 
50.31 
51.72 
51.65 
43.30 
42.85 



Table 3: Correlation coefficients of subsamples. 



3.2 Data Of Two Systems Mixed Up 



As a case in point for always doing 
visual analysis of the data we present 
Figures 5, 6 and 7. A small hardware 
change, the addition of 8 ports to an 
installation, had gone unnoticed. We 
had labeled this system VA. Even though 
this change may seem unimportant, Table 
4 shows the effect it had in the 
correlation coefficients. The 
regression analysis was affected 
accordingly. The two resulting systems 
were labeled VAB and VAA. 



When analyzing Figure 5 we noticed 
and traced the hardware change - In the 
data for VAB and VAA we saw that most 
correlations improved. Moreover, the 
pair nau and np showed such a large 
increase that we chose it for d i splay . 
Figure 6 has the sc^ttergram for VAB and 
Figure 7 that for VAA. The new 
correlation values obtained for the two 
systems were in greater harmony with the 
other VAX measurements. The low value 
obtained for nu versus nau in VAB may be 
attributed to the use of software that 
began running in that system at the time 
[6,8] . 
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Correlation 


Coefficients 


System - 


: nu vs nau : nau 


vs np : nu vs np 


VA 


: .758 : 


.769 : .443 


VAB 


i .492 : 


.906 : .606 


VAA 


: .852 : 


.838 : .746 



Table 4: Correlation Coefficients for mixed 
up systems. 



4.0 ANALYSIS OF THE RESPONSE TIME 
DISTRIBUTIONS 



In Section 3 we mentioned that the 
scattergrams of the different tasks in 
the differ en t systems consistently 
showed that the distributions for 
response time had larger variances and 
larger degrees of skewness for higher 
values of the work load estimator. 



Figures 8 and 9 are histograms, 
obtained using minitab [16] , of the 
cpu-bound task in VI and the P7A systems 
respectively. They correspond to the 
measurements made when each of these 
systems had 10 users logged in. When 
done on a larger scale, with 
consequently less clustering, they show 
even larger tails. Inspection of these 
histograms clearly shows that response 
time does not follow a normal 
distribution. 



EACH * REPRESENTS 2 OBSERVATIONS 



MIDDLE OF NUMBER OF 

INTERVAL OBSERVATIONS 
4. 0 

8 53 *************************** 

^ 2 28 ************** 

16." B **** 

20. 8 **** 

24. 4 ** 

28. 6 *** 

32. 4 ** 

36. 4 ** 

40 . 1 * 



Figure 8: System VI. Clustered histogram of response 
time. Task cpu-bound. 10 users logged in. 
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Figure 9: System P7A. Clustered histogram of response 
time. Task cpu-bound. 10 users logged in. 
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The ideal work load estimator is 
that which exactly determines the 
activities in a system. Response time 
measured against an ideal estimator will 
behave as a function; i.e., the 
variance of the measurements is zero or, 
in other words we always obtain the same 
value. It is clear that our estimators 
are fairly rough. Their main advantages 
and interest arise from their easy 
definition, their portability and the 
low overhead involved in their 
measurement. The problem then is to 
find suitable statistical distributions 
to fit our data from which we may obtain 
estimations of our desired performance 
parameters • 

A tnorough analysis of our 
measurements indicated that response 
time may always be accurately 
approximated by a Gamma distribution. 
This distribution offers the appropriate 
flexibility to adequately fit our data. 
Our problems are then reduced to 
characterize the distribution's 
parameters. The problem is that we need 
to fit one distribution for each value 
of the work load estimator. We also 
know that the variances are different. 
In full generality this requires finding 
too many parameters. 

We then looked for possible 
relationships between the first and 
second moments o.f the observed 
distributions. Much to our surprise we 
found that there was a high correlation 
between the mean and the variance. 



Figures 10 and 11 depict the VAriance 
plotted against the Average for the 
systems VI and P7A (N7) . The 
measurements were taken from the 
cpu-bound task using nau (AC) as work 
load estimator. The associated 

regression analysis showed that in VI 
92% of the variation in the variance 
could be explained in terms of the 
variation in the mean. This was 68% for 
P7A. Other tasks and systems had the 
same behavior and their values were 
always above 60%. This empirical 
relationship provided the final 
complexity reduction needed to 
appropriately fit a Generalised Linear 
Model [12]. 



5.0 



MODELLING DATA WITH 
SYSTEMS 



GLIM IN FIVE 



GLIM is an interactive package for 
modelling data through generalised 
linear models [2,12]. Its utilization 
allows fitting different parameter 
combinations to the data in brief time. 
The theory behind the models is 
presented in Section 5.1. From the 
user's point of view, one defines a 
dependent variable, yvar, which for us 
corresponds to response time, and fits 
its expected value in terms of other 
observed variables. Whenever more than 
one such observed variable exists, as in 
our case where we have nu, np and nau, 
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Figure 10: System VI. Mean versus variance. Task cpu-bound. 
Work load estimator: nau. 
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Figure 11: System P7A. Mean versus variance. Task cpu-bound. 
Work load estimator: nau. 



linear combinations of them can also be 
fitted as in ordinary regression 
analys is • 

Moreover, GLIM also permits 
choosing the underlying error 

distributions, and has different 
modelling assumptions for each such 
class. For example, normal errors are 
assumed to have the same variance. This 
is not the case for Gamma errors, where 
the variance is assumed to be a fixed 
multiple of the mean. The analysis 
presented in Section 4 supports this 

last hypothesis. Since we had observed 
that the distibutions had gamma shapes, 
and that the variances were directly 
proportional to the means, we assumed 
that GLIM errors were gamma. 

The other degree of choice which 
GLIM offers is the relationship, or 
link, existing between the expected 
value of yvar and the estimating 
variables. Possibilities include the 
identity relation, the inverse, square 
root, and logarithm among others. If, 
for example, one chooses the logarithm 
relationship, then it means that the 
values of the fitting model and yvar are 
linked through the logarithm; i.e., the 
expected value of yvar is the 
exponentiation of the value given by the 
fitting model. 

It is well known that response time 
as a function of work load estimators 
behaves exponentially [4,5]. Thus the 
logarithm link was used. As a criterion 
for evaluating alternative fitting 
models, we not only considered the 
deviance given by GLIM, which is the 



maximized likelihood, but also looked at 
the sums of squares <: C the differences 
between our estimation and the 
observations. GLIM minimizes this in 
the presence of normally distributed 
errors. 



5.1 Tne Generalized Linear Models 



Underlying the concept of a 
statistical model for a random variable 
is the idea that the var iable under 
investigation has a definite structure 
which will explain the values actually 
obtained as well as predict future 
values. The structure is in fact a 
description of the population and will 
be mirrored in whatever sample we 
obtain. It postulates that the variable 
can be expressed in terms of other more 
basic variables: the components of the 
structure. if these latter variables 
have f ixed ( though possibly unknown) 
values they are termed systematic 
components whereas if they too are 
random variables they are termed random 
components . 

Definition of a Generalized Linear 

Model 

Even though' one may consider any 
random variable Y, where y[i] denotes 
its ith sample value, to be 
representable by any combination of any 
number of components , for most practical 
purposes simple structures suffice. A 
Generalised Linear Model (GLM) is 
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determined as follows. Let a set of 
independent random variables Y[i] (i » 
1, ... t n) have means u[i], so that 

Y[i] - u[i] + e[i] . 

Then there aro 'three basic properties 
which define a GLM: 

The Eiror Structure 

The probability density function of 
Yti] is given by p(Y[i] )«exp{ (Y[i]Q[i] - 
b(Q[i] ) )/a[i] (F) + c(Y[iJ,F)} for 
suitable choice of a[i], b and c. (Note 
that F, termed the scale parameter, is 
constant for all i.) The mean and the 
variance of Y[i] can then be expressed 
in terms of Q[i] and F : 

E(Y[il) - b' (Q[i]) 
var(Y[i]) - b"(Q[i]) * a[i](F) 

where primes denote differentiation with 
respect to Q. 

For example the Normal distribution 
is obtained by setting a[i](Q) - Q, 
b(Q[i]) * 1/2 * Q[i]**2 and c(Y[i],Q) » 
-1/2 * {log(2TQ) + Y[i]**2/F], where F 
would usually be denoted by sigma 
squared . 

It is convenient to write b"(Q[i]) 
* t[i]**2, the variance function, which 
is a function of u[i] only, so var(Y[i]) 
- a[i](Q) * t[i]**2 * Q * t[i]**2 / w[i] 
where the w[i1 are called the prior 
weights, and the functions a[i](Q) have 
the form F/w[ i] • 



The Linear Predictor 

The role played by the remaini^fg 
variables in the structure of each 
observation is expressed as a linear sum 
of their effects for the observation, 
called the linear predictor, n[i]/ 

n[i] - sum{j-l to p}x[i,j] * b[j] 

where the x[i,j] are known and the b[j] 
are (usually unknown) parameters. The 
matrix X, of order n x p, is called the 
design matrix. The righthand side of 
the equation is called the 1 inear 
structure. If an x[i,j] represents the 
presence or absence of a level of a 
factor then b[j] is the effect of that 
level; if x(i,j] is the value of a 
quantitative covariable then b[j] scales 
x[i,j] to give its effect on nfi].,. 

The Link Function 

The relationship between the mean 
of the ith observation and its linear 



predictor is given by the link function 
9 t i] : 

n[i] - g[i] (u[i]) 

where the g[i] are assumed monotonic and 
dif ferentiable. We define h[i] where 

u[i] - h[i] (n[i]) 

as the inverse of the link function. 

Although each observation could in 

theory have a different link function, 

this is x rare in practice and so the 
subscript is dropped. 

In summary, a particular GLM can be 
identified by specifying the error 
distribution of the random component, 
the make-up of \the linear predictor and 
the function linking the means to the 
linear predictors. All error 

distributions must belong to the 
exponential family. 



5.2 Summary Of The Analysis Of Five 
Systems 



In this section we present the 
results obtained using GLIM on the data 
of five systems. We have chosen the 
modelling of response time for the C 
compilation task (cc) for display, 
because of the relevance that the C 
programming language has in all UNIX 
systems [11,15]. We display seven 
linear models based in the three work 
load estimators np, nu and nau. 

Table 5 sumraar izes the deviances 
and sums ' of squares of the differences 
between the fitted points and the 
observed ones. The values given by GLIM 
have been divided by the degrees of 
freedom of the corresponding samples to 
take into account the size of the 
sample. ' We have named these quotients 
the normalized deviance and the 
normalized sum of squares. 

With the exception of P7A, we see 
in Table 5 that in all systems the best 
single estimator was np. In P7A it 
turned out to be nau. We can also see 
that the best estimator pair was np+nu. 
Again P7A was anomalous in this respect. 
The scattergrams for rt versus each of 
our m three work load estimators for P7A 
show that there is a large degree of 
variability. This system did not have 
the kind of behavior observed in the 
others which was akin to Figures 1, 2 
and 3. Nau does group the measurements 
much better in this case. The model is 
correct when it gives nau as a better 
fit than np. 
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Table 5: Normalized deviance and normalized sums of squares for the 
different systems and linear estimators. Task: C compilation. 



The failure of np to be the best 
single estimator indicates that the 
users in this system must have a 
peculiarity not present in other 
systems. This may be due to the fact 
that it is heavily used for 
instructional purposes. Since many 
users may be concurrently executing the 
same code, such as a toy operating 
system for example, severe distortions 
from the np estimator point of view may 
exist. For example, the memory 

utilization of several usefs executing 
the same shared piece of code is much 
less than i,t would be if each had his 
own copy. Thus the load on the 
resources of the machine is less than 
that suggested by the number of 
processes. This load is clearly more 
faithful to the number of active users. 

Tables 6.1 and 6.2 have all the 
parameters given by GLIM in the analysis 
of the task cc. In Section 5.1 we saw 
how to obtain analytic formulae to 
represent the expected value of rt in 
terms of these parameters. 



It is interesting to notice from 
Tables 6.1 and 6.2 that in several 
systems the addition of nau to the 
estimator np+nu does not increase the 
accuracy of the model. Moreover, there 
are other systems where the normalized 
deviance and the normalized sum of 
squares do improve but the coefficient 
associated when nau has a standard error 
which renders it not significant. These 
observations laad us to conclude that 
for most systems nau does not add 
significant modelling information in the 
presence of np and nu. In the case of a 
long data gathering period, after an 
exploratory period .one could assess 
whether nao— is indeed needed and if not 
omit gathering statistics about it. 
This will reduce the overhead of the 
method and its total cost, while 
preserving its modelling accuracy. 
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. SYSTEMS . s 

Parameters: VI : P7A : VM Jt , : 

:Estimate:Std. Error : Estimate :Std. Error: Estimate :Std. Error : 

m _ . .. • ________ • — — — —— — — — — • ———————— ; — — — — — — — — — , 

;""»GM i-0"8103 _ ':"5845E-l 1-1.010 I.7959E-1 : ,5254 : .1008 : 

• np : .3029E-1: .6104E-3 : . 3135E-1 : . 6388E-3 : . 5679S-1 : . 2155L-2 : 

!"scaie"": Il642 i -4656 | -1750 j 

•~~~%GM i"l"533 _ "i"l329E-l ': 1 .736 I.1980E-1 : 2.748 :.1933E-1 : 
' nu :.3962E-l:.8237E-3 :. 5547 E-l :. 8986E-3 : . 801 5E-1 : . 2963E-2 : 

. : : : * : 

I'scale"" .1719 : -3237 : _-1875 : 

I %gm 17 21 "1^1150 E-l"": 1 .747 1 .2234E-1 ': 2.467 k. 31648-1 : 

". nau : .5028 I.1213E-1 : ; 6462E-1 : . 1200E-2 : .1047 :.4341E-2 : 

I'^ciii""! Iwii" j " " 1 " H__^__lllllll ! 11 li^II 1— -I 

•"""%GM""""i-"314E-2i"6737E-l ': 1.143 ':.7197E-1 : 1. 104 : .1282 : 

• np -.1865E-2:.8011E-3 : .5721E-2: .6688E-3 : . 3961E-1 : . 3075E-2 : 
nS ;.2136E-l:.1057E-2 : . 4937E-1 : . 1 128E-2 : . 4032E-1 : . 4090E-2 : 

:"scaie'"; '.1484"' I ^166 ":* -1609 : 

____ ——— — — — — — — — ; — a • — • • 

''.~~~Ic~m~~~ "-0~2587 I.7452E-1 ": .6764 :.7219E-1 : .8455 : .1446 : 
' no : 2310E-l:.8622E-3 : . 1001E-1 : .6731E-3 : . 4586E-1 : . 3966E-2 : 
I naS ': "800 : .1587E-1 : . 5380E-1 : . 1435E-2 : . 2758E-1 : .7516E-2 : 

! . : : : . • * : : 

': "scale" : .1587 : -3330 !____ ? __:i 7 " ! 

'. %GM : :"l"544~~!"l319E-l ": 1.666 I.2082E-1 i 2.628 -3608E-1 : 
i nu :.2973E-l:.1349E-2 : . 3582E-1 : . 2208E-2 : . 5818E-1 : . 6053E-2 : 
i nau i .1696 :.1882E-1 : . 267 2E-1 : . 2791E- 2 : . 3467E-1 : . 8623E-2 : 

: : "scaie'"i '.1679 i -3135 : -1855 J 

•~~~%GM : :"l097E-li"7326E-l"i"l.l74 I.7104E-1 i .7469 : .1389 : 
' no • 1849E-1: .8680E-3 : . 481 1E-2 : . 6636E-3 : . 5351E-1 : . 3770E-2 : 
: nS i 2I98B-1 I132IB-2 : . 3161E-1 : . 2243E-2 : . 66 12E-1 : . 5560E-2 - : 
": nau : .9277E-2:.1917E-1 : . 2527E-1 : . 2783E-2 : - . 6 31 E-l : . 9850E-2 : 

. . . . : : : : : : '-<J 

'/'scale : .1484 : -3085 t ^ .1564 

: :— : : 

Table 6.1: GLM parameters for three systems, and seven linear 
estimators. Task: C compilation. 
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Parameters 


S V S T 
P 7 B 
: Estimate : Std . Er ror 


EMS : 
P 4 A : : 
: Es t ima te: Std. Error : 


%GM 
np 


: -1.229 : .1529 
: .3951E-1: .1401E-2 


: .6347 : .1491 : 
: .7122E-1: .4629E-2 : 


: scale 


.2408 


.1133 : 


• %GM 
: nu 


: 2.060 :.4108E-1 
: .4390E-1: .1626E-2 


: 2.465 :.4433E-1 : 
: .1509 :.1258E-1 : 


: scale 


.2566 


: .1463 


%GM 

: nau 


: 2.030 :.4724E-1 
: .5319E-1: .2201E-2 


: 2.162 :.8466E-1 : 
: .1395 :.1466E-1 : 


: scale 


.2888 


: .1657 s 


%GM 
np 

: nu 


: -.1511 : .2514 
: .2589E-1: .2892E-2 
: .1728E-1: .3250E-2 


: .9139 : .1685 : 
: .5752E-1: .6052E-2 : 
: .5109E-1: .1448E-1 : 


scale 


.2334 


.1091 : 


%GM 

np 

: nau 


: -1.381 : .2799 
: .4150E-1: .3342E-2 
:-.321E-2: .4794E-2 


: .5103 : .1673 

: .8106E-1: .7384E-2 : 

:-.344E-l: .1934E-1 : 


: scale 


.2410 


: .1125 : 


: %GM 
: ' nu 
: nau 


: 2.112 :.4508E-1 
: .5862E-1: .5897E-2 
:-.198E-l: .7526E-2 


: 2.212 :.7903E-1 : 
: .1140 :.1604E-1 : 
: .6519E-1: .1757E-1 : 


: scale 


: .2546 


.1402 


: %GM 
np 

: nu 
: nau 


:-0.9839 : .2656 
: .3762E-1: .3166E-2 
: .5266E-1: .5435E-2 
:-.637E-l: .7889E-2 


: .7749 : .1755 
: .7059E-1: .7633E-2 
: .6126E-1: .1486E-1 
:-.560E-l .1955E-1 


: scale 


.2157 


.1068 



Table 6.2: GLM parameters for two systems and 
seven linear estimators. Task: C compilation. 



5.3 Robustness Of The Results In Three 
Systems 



In Tables 7, 8 and 9 we present the 
parameters given by GLIM when ramdom 
subsamples of different sizes of the 
same distribution were analyzed. We 
considertd the same task as in Section 
5.2: the C compilation. These 

subsamples were made from the total 
sample by a program which selected 
random entries with a fixed probability 
for membership. We created three files 



with membership probabilities of .5, .25 
and .125, to check the robustness of the 
modelling technique. 

From the appropriate entries in 
Tables 6.1 and 6.2 we may see that the 
coefficients found in Tables 7, 8 and 9 
are very robust. In almost all of the 
cases (the exception being the 170-point 
subsample in System VM) the estimators 
for the smaller samples were within 
their standard error from those found in 
the larger samples. 
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: parameter : 


Sample Si 
Estimator : 


ze 1497 : 
Std. Error : 


Sample Si 
Estimator : 


.ze 729 : 
Std. Error : 


Sample Si 
Estimator : 


ize 376 : 
Std .Error : 


: %GM 
: np 
: naa 


•5640 : 

.112E-1 

.5146E-1 


.1021 • 
: .9473E-3: 
: .202E-2 


.6707 
.9783E-2 
: .5608E-1 


.1404 
: .1304E-2 
: .2785E-2: 


.6130 
: .1069E-1 
.5173E-1 


. .1906 : 
. .1755E-2: 
: .4124E-2: 


: scale 


: .3! 


J93 


.3146 


i .3395 : 


Table 7 


: System P7B. Robust! 


less of the selected linear estimator. 



Task: C compilation. 



Parameter 



Sample Size 1722 
Estimator : Std. Error 



Sample Size 851 : Sample Size 428 
Estimator : Std. Error : Estimator: Std. Error 



: %GM 
: np 
: nu 


-.1275 : .9481E-2: 
.2020E-1 : .1128E-2: 
: .2008E-1: .1502E-2: 


-.6774E-2: 
.1893E-1 
.2032E-1 


.1405 : .6659E-1: 
: .1652E-2: .1765E-1 
; .2060E-2: .2341E-1 


.1886 
: ;2264E-2 
: .2955E-2 


: scale 


.1506 


: .1422 s -1309 



Table 8: System VI. Robustness of the selected linear estimator, 
Task: C compilation. 



Parameter 



Sample Size 739 
Estimator : Std. Error 



Sample Size 365 : Sample Size 170 
Estimator : Std. Error : Estimator : Std. Error : 



: %GM : 
: np i 
: nu * 


1.348 
.3401E- 
: .4529E- 


: .1828 
1: .4387E-1 
1: .5835E-2 


1.428 
. .3209E-1 
: .4458E-1 


.2424 :- 
: .5783E-2: 
: '.7294E-2:- 


.5394E-1: .3657 
.6752E-1: .8716E-2 
.1081E-2: .1062E-1 


: scale 




1742 


: .1553 : 


.1186 



Table 9: System VM. Robustness of the selected linear estimator, 
Task: C compilation. 



However, the price paid for fewer 
data points was much larger standard 
errors of the fitting estimators. Given 
that each smaller sample was roughly 
half of the larger one, we see that 
standard errors diminished approximately 
25% when the size of the sample doubled. 
This rule can be a guide in deciding the 
duration of the data gathering period as 
a function of the desired accuracy. 
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We also appreciated that for the 
best linear estimators in each system, 
those found from samples which had more 
than 300 points showed high levels of 
significance, i.e. , the standard error 
of the estimators were small. This 
gives us an empirical global bound on 
the minimum number of data points one 
should gather in a system. 
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6.0 CONCLUSIONS 



We have found that response time 
has a statistical behavior which is 
consistent enough to permit the 
utilization of generalized linear models 
(GLMs) to describe and predict its 
values. Thi3, in turn, has validated 
the utilization of rather imprecise work 
load estimators, such as response time 
(rp) , number of users (nu) and number of 
active .users (nau) , as basis for 
estimating selected performance indices 
in a computer installation. 

Moreover , the robustness of the 
models also permit comparisons of 
different compter installations using 
the terminal probe method as data 
gathering technique while the systems 
are executing their natural work loads. 
We have also seen that a minimum of 300 
points per installation appears to be 
necessary for obtaining a minimum of 
accuracy, and that doubling the size of 
the sample will reduce by 25% the 
standard error of the parameters 
estimated by* the model. We may then 
assess the cost of achieving a 
predetermined error bound much better. 

When measured against our work load 
estimators np, nu and nau, response time 
appears to have Gamma distributed 
values. What is more, we have 
empirically observed that the means and 
variances of these distributions are 
highly correlated as functions of each 
of our work load estimators. Their 
ratio is almost directly proportional to 
the values of the work load estimator. 
This behavior is essential when using a 
GLM with Gamma error, and gives us a 
better understanding of the behavior of 
response time. 

We could also observe that for most 
systems and tasks, nau adds no 
significant additional information to 
that given by np and nu. This fact can 
be used to reduce the data gathering 
cost and the overhead of the terminal 
probe method • 
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SOME ELEMENTS OF SOFTWARE FUNCTION 
AND COST ANALYSIS AS RELATED 
TO PERFORMANCE 

James E. Gaffney, Jr 

Federal Systems Division 
Gaithersburg, MD 20017 



This paper provides Some elements of modern software function and cost 
analysis. Proper emphasis on the software process is basic to ensuring that the 
software will perform as specified. Subjects covered are: the establishment of 
requirements, life cycle management and costing. They are all attributes of the 
software management process. 

Keywords: Costing, life cycle management, requirements, and software management. 



1 . Requirements 

Good communications between the software 
producer and the user is central to the 
realization of the software performance , goals 
derived by the user. This means establishing 
what functions are to be provided and agreeing 
to cost, schedule, and quality objectives, and 
appropriate measures of them. 

"Good management of software involves both 
producer and user, and starts with a clear 
statement of the functions that the 
software is to provide and continues with a 
methodology that provides the technical 
controls and resource management to produce 
high quality software yjj" 111 acceptable 
funding and time limits." 

The software implementation process may be said 
to consist of four stages; the development of 
requirements, the creation of a system 
description, the creation of a design, and 
finally the writing of the code that will 
effect the portion of' the requirements 
addressed by the software (others may be 
addressed by hardware per se) . Each of these 
stages is increasingly specific in expressing 
desired functionality. The first three address 
people; only the last addresses the computer 
itself. The program implementation at each 
stage is a vehicle for communication of 
function. Each stage is an elaboration and 
implementation of the previous stage. Ideally, 
the progression from requirements to code is 
linear; actually, there is often feedback among 
the stages. . y *' 
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The establishment of requirements is often a 
very difficult task. Requirements include 
statements about the functionality to be 
provided, and also designation of the spatial 
and temporal performance objectives that the 
software is to satisfy. Proper attention to 
requirements is important because quality 
be defined as "conformance to requirements." 
Most generally, such "conformance" means that 
the product meets the needs of the user and 
satisfies stated performance criteria. Some 
specific measures of software quality that have 
been employed are: 

° latent error content - defects present 
at time of delivery (estimated) 

° mean time between failure 

0 number of defects found during some test 
period . 

It can be said that "quality" is an aspect of 
"product integrity" which includes, among other 
aspects of the software development process, 
adherence to schedule and cost objectives. 

2. Life Cycle Management 



The basic objective of software development 
management is to produce a desired amount of 
function at a specified quality level, within a 
given cost and -schedule envelope. Appropriate 
quantitative and qualitative management 
techniques should be -applied to identify and 
control the stages of the software development 
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process and assure the quality of the resultant 
product. A quantitative management support 
system Is suggested as an aid to the life cycle 
management process. Prior to the initiation of 
the development process, basic parameters of 
the software process and product including: 
amount of function (indicated by number of 



eodt_ 



other appropriate 



measure), intended cost, and schedule should he 
estimated. During the development and 
maintenance/operations portions of the lift- 
cycle, parameters of the process and product 
can be monitored for adherence to objectives or 
requirements established earlier and 
corrections to the produce and product may be 
made as appropriate. At various stages of the 
life cycle, data about both the process and the 
product can be collected. This information can 
be used to make corrections to the 
estimation/prediction models, and more 
generally, to support the objective of learning 
from experience in a structured manner so that 
"it can be done better the next time." A key 
element of the software process is that there 
be a continuing interchange of data between the 
management support system and the (software) 
life cycle process. 

3. Costing 

The cost of software, both development and 
maintenance/operations, may be computed from 
the general relationship: 



labor (man months) - function 
developed x development rate 



be 



There are a number of measures of function 
'size' available. Probably, the most commonly 
used is 'source lines of code' (SLOC) . There 
are alternatives to this measure, such ,a^ 
'function points' developed by Albrecht. 
This measure relates the amount of "function" 
the software is to provide to the data it is to 
use (absorb) and to generate (produce^. 
'Function points' are relatable to SLOCr^, 
Also, Britcher and Ciffney have suggested, 
based on the stare machine model of a software 
system, another way of measuring the amount 
of function to be provided by a software 
system. They observe that any software system 
should have the same number of 'levels' of 
elaboration of function. Hence, one should be 
able to produce an estimate based on the number 
of "boxes" at a certain functional level in 
recognition that, on the average, about the 
same amount cf function should be resident in a 
"box" at a given level in the specification 
hierarchy. The amount of function (say 
represented by SLOC) is a key parameter tc be 
used by the performance analyst, even before 

the code has been written. Indeed, as Smith 
has described, : ' . . . a static analysis [can be 
used] to derive the* mean, best case, and worst 
case response times. The static analysis is 
based on the optimistic assumption that there 
are no other jobs in the Tip st configuration 
competing for resources." 1 This type of 
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ana Ivsis dea Is with what ^ Jjas been t ermed 
nioni'p rog ramming per for ma nee . 

The 'development rate' specifies the rate (say 
in man month- per thousand SLOC or per function 
point ) at which the function i s to be 
deve 1 oped . The 1 development rat l- ' slum Id be 
taken as the sum of the rates for each oi the 
relevant activities or work components that 
consti tu 
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particular software development 
process. 1 These components are the ^^j^ 

for a software engineering management model 
used by the Federal Systems Division of IBM. 
Sixteen work components have been identified 
from which the software organization or the 
engineering organization involved in a software 
development project can structure its 
part icu lar activities. They ar-: : 

° software requirements definition 

0 software system description 

0 software development planning 

0 engineering change analysis 

0 functional design 

0 program design 

0 test design 

0 software tools 

° design evaluation 

0 modul e development 

° development test ing 

problem analysis and error correction 
software system test procedures 
soflwirc integration and test 

0 system test support 

0 acceptance test supper t 

T hu s , a cost est i ma t v c a n b e ma d e by 
considering the nature of the particular 
software development job and the work 
components (such as program design, coding, 
etc.) that constitute it. Then, the labor (man 
months) for each component is estimated. The 
sum of these man month figures is the amount 
required for the given job. Considering the 
development process in terms of its 
constituents enables the estimator to achieve a 
greater degree of intellectual control that if 
he were to evaluate the process overall. For 
example, it may not be clear how the 
availability of a new process that facilitates 
unit testing would impact development 
productivity . However , its effect on the work 
component that cov^xs, unit test would be much 
easier to discern. 
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for os t imat ing 
above would be 
based on the actual 



The parameters of the mode! 
sof tware development described 
ex pec ted to be mod if ied , 
experience of the development organization for 
the type(s) of code concerned, suggested in 
the section on life cycle management . This 
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same model cm support the management of 
maintenance and enhancement activities, 
treating them as a sequence of "development 
efforts." Depending upon the amount of change 
relative to the baseline system, then, the 
maintenance and operations effort can be 
handled as a new project (albeit with a lot of 
retained code) rather than as a continuation of 
the old. 1 1 

h. Conclusions 

Both the developers and users of software are 
concerned with issues of performance. 
Obtaining good software begins with the 
establishment of a proper set of requirements 
and continues with management that focuses on 
the development of the desired software 
functionality to be provided at a level of 
quality within an agreed upon cost and schedule 
envelope. There are tools and techniques 
available to su\ . rt software management with 
quantitative assessments of the software 
process. They can contribute significantly to 
the software exhibiting the lev .1 of 
performance intended for it. 
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BENCHMARK AND CONVERT ON TOOL 
TEST DATA REDUCTION PROGRAM 



Frances A. Kazlauski 

Naval Data Automation Command 
Washington, D.C. 



Tin- Test Data Reduction Program (TDRP) is a software tool for use as a COBOL 
,ro, ram conversion aid or benchmark testing aid. It ensures that newly developed, 
enhanced or converted COBOL programs are tested as thoroughly as required by 
employing existing production data files and utilizing fewer computer resources 
U is used to extract app .Mate data records from the production data file and 
create a reduced data file. The reduced data file will achieve the same level of 
testing coverage as the original production file and may be used for future 
program testing. 



Key words: Benchmark; COBOL; 



conversion; coverage; extract; reduced. 



1 . Purpose 

The purpose of this paper is to explain the 
capability and use of the Test Data Reduction 
Program (TDRP) . TDRP is a software tool for the 
UNI VAC 1100 series and IBM compatible computers. 
Tt provides data reduction capability for 
sequential or indexed-sequontial files. 

2 . Background 

There are tv:o common approaches to the pre- 
paration of test data: the generation of test 
data sets and the selection of a subset of 
records from an existing data file. The latter 
approach is more cost-effective for benchmarks 
or conversion projects. A method is needed 
which will extract a minimally required data set 
from an existing data file for thorough program 
testing. To assure that the extracted records 
perform the same coverage of testing as the 
original data file, program execution must be 
monitored through the insertion of probes. The 
implementation of such a capability provides a 
cost-effective vehicle for thoroughly testing 
benchmarks or converted software since testing 
can be done using seller data files and less 
machine time. In addition, it also improves the 
reliability and validity of the tested software. 
The requirement for this capability led to the 
development of the Test Data Reduction Program 
(TDRP) in May 1980 at the Navy Regional Data 
Automation Center, Washington, D.C. The project 
sponsor is the Naval Data Automation Command 
(NAVDAC) Code 40. 



3 . Obj ect ives 
The basic objectives of TDRP are: 

o Extract records from an existing 
data file for program testing 

o Ensure the same coverage of program 
testing is achieved for the ex- 
tracted data as for the original 
data 

o Achieve a " m i n i ma 1 ly tho rou gh test 
criterion 

The "minimally-thorough-test" criterion ensures 
that each statement in a program be executed at 
least once. An additional feature of TDRP 'may be 
the development of more comprehensive test data 
for acceptance testing since the TDRP output 
report shows all untested code in the program. 

4. Processing 

TDRP is a test data extractor, not a test 
data generator. The program operates on a pro- 
duction system for which a data file already 
exists. It extracts records from this existing 
data file and creates a new test file based upon 
program logic and a user specified reduction 
factor. When a new logic pa :.n is taker, the 
current record is saved in the reduced file. 
Additional records may be saved to obtain the 
proper file size specified by the user. 
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TDRP sequential file processing consists of 
four phases: n- format t in g, instrumenting, 
executing and reporting. For indexed-sequent ial 
files which are randomly-accessed, an additional 
sort phase is required to eliminate duplicate 
records . 

In the reformatting phase, the COBOL source 
program is put into the form required by the 
Instrumentation phase. The reformatted program 
created in this phase contains one COBOL verb 
per line of code. A table of verb frequencies 
is also displayed at the end of the reformatted 
program. A portion of a reformatted COBOL 
source progrr.m is shown in Figure 1. 

The instrumentation phase uses the reformat- 
ted COBOL source program and in addition reads 
the user parameter card file. The mandatory 
parameter card which specifies the data file to 
be reduced contains the following information: 

i. Maximum record length 

e File name 

o Record name 

o Starting position for indexed key 

o Key length for indexed files 

o File type 

o Blocking factor 

o Percentage reduction factor (number 
between 0 and 50 where 0 indicates 
the minimally-thorough-test) 

The user may provide optional parameter cards 
containing COBOL SELECT, FD and RECORD descrip- 
tion clauses co describe the file to be reduced 
in the ENVIRONMENT and DATA divisions. If the 
SELECT card is present, the FD and RECORD clause 
cards must be coded. Once the parameter card 
file and reformatted source program are read in 
by the instrumentation phase, code is inserted 
in the reformatted program. This code enables 
the instrumented program to communicate with a 
data collection routine which collects execution 
statistics and monitors data record selection 
during execution. A portion of the instrumented 
code for the reformatted program in the previous 
figure in shown in Figure 2. 

For randomly-accessed indexed sequential 
file processing, a skeleton COBOL program is 
read, modified and written to a file. This 
program will be executed later to sort the 
extracted data records. 

In the execution phase, the instrumented 
COBOL program is compiled, mapped or linked, and 
executed. The TDRP subroutine for statistical 
data collection and data record extraction must 
be included in the collection process. During 
execution, the appropriate data records are 
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written to the reduced file. In addition, 
execution statement counters are incremented and 
written to the statistical data collection file. 

The reporting phase uses the reiormatted 
COBOL source code and the statistical data file. 
The report lists the reformatted source code and 
the execution count for each statement. In 
addition, for conditional statements, the 
execution counts are also given for the "true" 
and "false" paths. This report may help in two 
ways : 

o To evaluate the validity of the data 
file a- i baseline to produce the 
reduce a file 

o To facilitate the creation of 

additional data records to assure 
completeness of program testing 

A summary of program testing coverage is also 
included and consists of the following: 

o Number of COBOL source statements 

o Number of unexecuted statements 

o Number of conditional statements 

o Number of unexecuted conditional 
s tat ements 

o Percentage of unexecuted statements 

o Percentage of unexecuted conditional 
statements 

No provision is made to determine if the 
program logic is correct, however the report does 
show the statements executed utilizing the user 
data file. The correctness of the logic is 
difficult to check since recent studies show that 
most of the serious program errors are errors of 
omission due to incorrect or misinterpreted ^ 
functional specifications or requirements (1) . 

An example of a portion of a TDRP output 
report is illustrated in Figure 3. 

5. Considerations 

The potential savings in machine time and 
storage requirements can justify the cost of CPU 
time overhead, additional core requirements and 
the number of TDRP runs. 

The CPU time overhead for compilation and 
execution is the result of the number of COBOL 
statements and subroutine calls inserted. The 
total number of inserted statements will be 
proportional to the number of paragraphs and 
conditional statements in the original program. 



Figures in parentheses indicate the 
literate references at the end of this paper. 
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CK-MSTR. 

IF IM-JOB EQUALS HOLD-JOB 
GO TO EVEN-MAT. 

IF IM-JOB IS GREATER THAN HOLD-JOB 
MOVE 1 TO CK-CD 
GO TO 
UNEVEN. 

MOVE IN-MSTR TO OT-MSTR. 

IF OM-3 EQUALS 5 AND OM-4 IS LESS THAN TODAYS-DT 

MOVE 3 TO 

OM-3 

PERFORM PR-PUN. 

IF OM-3 EQUALS 5 AND OM-5 IS LESS THAN TODAYS-DT 

MOVE A TO 

OM-3 

PERFORM PR-PUN. 

IF OM-6 EQUALS 8 AND OM-7 IS LESS THAN TODAYS-DT 

MOVE 7 TO 

OM-6 

PERFORM PR-PUN. 

IF OM-6 EQUALS 8 AND OM-8 IS LESS THAN TODAYS-DT 

MOVE 9 TO 

OM-6 

PERFORM PR-PUN. 

IF OM-JOB IS UNEQUAL TO SPACES 

WRITE OT-MSTR 

ADD 1 TO MSTR-llOO-CT. 

GO TO CK-WHIP. 
EVEN-MAT. 

MOVE IN-MSTR TO OT-MSTR. 

MOVE 3 TO CK-CD. 
UNEVEN . 

IF H-ONE IS UNEQUAL TO SPACES 
PERFORM MOV-H1. 
IF H-TWO IS UNEQUAL TO SPACES 
PERFORM MOV-H2. 

IF H-THREE IS UNEQUAL TO SPACES 

PERFORM MOV-H3 THRU EX-1. 

IF H-FOUR IS UNEQUAL TO SPACES 

PERFORM MOV-H4 THRU EX-1. 

IF OM-JOB IS UNEQUAL TO SPACES 

PERFORM PR-PUN. 

IF OM-J1B EQUALS 3 

MOVE OM-J1C TO OM-36A 

MOVE OM-J1A TO 

OM-36B 

ELSE 

MOVE OM-JOB TO OM-36. 

MOVE SPACES TO H-ONE H-TWO H-THREE H-FOUR. 
IF OM-JOB IS UNEQUAL TO SPACES 
WRITE OT-MSTR 
ADD 1 TO MSTR-llOO-CT. 
CK-WHIP . 

IF OM-JOB EQUALS HOLD-W-JOB 
GO TO MAT-WHIP. 

IF HOLD-W-JOB IS LESS THAN OM-JOB 

GO TO LO-WHIP. 

MOVE SPACES TO OT-MSTR. 

IF CK-CD EQUALS 1 

GO TO MOV1. 

IF CK-CD EQUALS 3 

ALTER 1-SW 

TO PROCEED TO MOV1 

ELSE 

ALTER 1-SW 



Figure 1. Reformatted COBOL Source Program 
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TO PROCEED TO CK-MSTR . 
MOVE 0 TO CK-CD. 
GO TO READ-M. 
MAT-WHIP. 

MOVE SPACES TO OT-MSTR. 

IF CK-CD EQUALS 1 

ALTER 2-SW 

TO PROCEED TO MOVl 

GO TO 

WHIP-MOV. 

IF CK-CD EQUALS 3 

ALTER 2-SW 

TO PROCEED TO MOVl 

ELSE 

ALTER 2-SW 

TO PROCEED TO CK-MSTR. 
ALTER 1-SW 

TO PROCEED TO WHIP-MOV. 
MOVE ZERO TO CK-CD. 
GO TO READ-M. 
LO-WHIP. 

IF TBLE-CTR IS LESS THAN 999 
ADD 001 TO TBLE-CTR. 

MOVE HOLD-W-JOB TO TAB-JOB (TBLE-CTR) . 
LO-W-RTN. 

ALTER 2-SW 

TO PROCEED TO CK-WHIP. 
GO TO WHIP-MOV. 
CLOS-MSTR. 

IF S-REC EQUALS HIGH-VALUES AND CR-IN EQUALS HIGH-VALUES 
GO 

TO FIN-CL. 

MOVE HIGH-VALUES TO IN-MSTR. 
GO TO 1-SW. 
CLOS-WHIP. 

IF S-REC EQUALS HIGH-VALUES AND IN-MSTR EQUALS HIGH-VALUES 
GO TO FIN-CL. 

MOVE HIGH-VALUES TO CR-IN. 
GO TO 1A-SW. 
CLOS-DET. 

IF IN-MSTR EQUALS HIGH-VALUES AND CR-IN EQUALS HIGH-VALUES 
GO TO FIN-CL-DET. 
MOVE HIGH-VALUES TO S-REC. 
GO TO 3-SW. 
FIN-CL-DET. 

IF H-ONE EQUALS SPACES AND H-TWO EQUALS SPACES AND It- THREE 
EQUALS SPACES AND H-FOUR EQUALS SPACES 
GO TO FIN-CL. 

IF H-ONE EQUALS HIGH-VALUES 
GO TO FIN-CL. 
PERFORM UNEVEN. 
FIN-CL. 

CLOSE MSTR-TN CR-FILE MSTR-OT PUNCH-OT 
DISPLAY '1100 MASTER RCDS 1 MSTR-1100-CT UPON PRINTER. 
MOVE 99 TO LIN-CTR. 
MOVE ZEROES TO PG-CTR. 
WR-WHTP. 

IF CTR-1 EQUALS TBLE-CTR 

GO TO LAST-CLOS. 

IF LIN-CTR IS GREATER THAN 50 

PERFORM WHIP-HD. 

ADD 001 TO CTR-1. 

MOVE TAB- JOB (CTR-1) TO PW-JOBl. 

IF CTR-1 EQUALS TBLE-CTR 
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GO TO LAS1 -CLOS . 

ADD 001 TO CTR-1. 

MOVE TAB- JOB (CTR-I) TO PW-J0B2. 

IF CTR-1 EQUALS TBLE-CTR 

GO TO LAST-CLOS. 

ADD 001 TO CTR-1. 

MOVE TAB- JOB (CTR-1) TO PW-J0B3. 

IF CTR-1 EQUALS TBLE-CTR 

GO TO LAST-CLOS. 

ADD 001 TO CTR-1. 

MOVE TAB- JOB (CTR-1) TO PW-J0B4. 

IF CTR-1 EQUALS TBLE-CTR 

GO TO LAST-CLOS. 

ADD 001 TO CTR-1. 

MOVE TAB-JOB (CTR-1) TO PW-J0B5. 

IF CTR-1 EQUALS TBLE-CTR 

GO TO LAST-CLOS. 

ADD 001 TO CTR-1. 

MOVE TAB- JOB (CTR-1) TO PW-J0B6. 

IF CTR-1 EQUALS TBLE-CTR 

GO TO LAST-CLOS. 

ADD 001 TO CTR-1. 

MOVE TAB- JOB (CTR-1) TO PW-J0E7. 

IF CTR-1 EQUALS TBLE-CTR 

GO TO LAST-CLOS. 

ADD 001 TO CTR-1. 

MOVE TAB-JOB (CTR-1) TO PW-J0B8. 

IF CTR-1 EQUALS TBLE-CTR 

GO TO LAST-CLOS. 

ADD 001 TO CTR-1. 

MOVE TAB- JOB (CTR-1) TO PW-J0B9. 

WRITE P-REC BEFORE ADVANCING 2 LINES. 

MOVE SPACES TO P-REC. 

ADD 02 TO LIN-CTR. 

GO TO WR-WHIP. 
LAST-CLOS. 

WRITE P-REC BEFORE 01. 

CLOSE P-FILE. 

STOP RUN. 
WHIP-HD. 

MOVE SPACES TO P-REC. 

WRITE P-REC BEFORE ADVANCING NEXT-PG. 
ADD 0001 TO PG-CTR. 

MOVE 'LISTING OF JOBS ON LINK NO/ JOB ORDER FILE BUT NOT ON 
' MAST' TO P-HD2-W. 

MOVE 'ER JOB ORDER FILE FOR ' TO P-HD2A-W. 

MOVE TODAYS-DT TO P-DT-W. 

MOVE 'PAGE 1 TO P-W-PG. 

MOVE PG-CTR TO P-W-ID. 

WRITE P-REC BEFORE ADVANCING 2 LINES. 

MOVE SPACES TO P-REC. 

MOVE 1 JOB-NO JOB-NO JOB-NO JOB-NO J 
'OB ' TO P-HD2. 

MOVE '-NO JOB-NO JOB-NO JOB-NO JOBrNO 
' ' TO P-HD2A. 

WRITE P-REC BEFORE ADVANCING 2 LINES . 
MOVE SPACES TO P-REC. 
MOVE 04 TO LIN-CTR. 



*** REFORMAT PROGRAM ENDED *** 
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TABLE OF VERB FREQUENCIES 



VERB 




COUNT 


ADD 




19 


ALTER 




9 


CLOSE 




5 


DISPLAY 




2 


ELSE 




17 


END 




6 


EXAMINE* 




2 


EXIT 




2 


GO 




78 


IF 




105 


INPUT 




A 


MOVE 




175 


NEXT 




1A 


ON 




) 


OPEN 




5 


OUTPUT 




3 


PERFORM 




1A 


READ 




5 


RELEASE 




2 


RETURN 




1 


SORT 




1 


STOP 




2 


TALLYING 




2 


WRITE 




17 



-5 

r 



ft 

f 
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CK-MSTR. 



MOVE 119 TO CURR-INDEX, CALL 'TDRP20\ 
MOVE 120 TO CURR-INDEX, CALL 'TDRP20 1 
IF IM-JOB EQUALS HOLD- JOB 
MOVE 120 TO CURR-INDEX, CALL 1 TDRP30 1 
GO TO EVEN-MAT. 

MOVE 121 TO CURR-INDEX, CALL 1 TDRP20 1 . 
IF IM-JOB IS GREATER THAN HOLD-JOB 
MOVE 121 TO CURR-INDEX, CALL 'TDRP30' 
MOVE 1 TO CK-CD 
GO TO 
UNEVEN. 

MOVE 122 TO CUFJl-INDEX, CALJ^ , TDRP20\ 

MOVE IN-MSTR TO OT-MSTR. 

MOVE 123 TO CURR-INDEX, CALL , TDRP20 f 

IF OM-3 EQUALS 5 AND OM-A IS LESS THAN TODAYS-DT 

MOVE 123 TO CURR-INDEX, CALL 'TDRPSO 1 

MOVE 3 TO 

OM-3 

PERFORM PR-PUN. 

MOVE 12A TO CURR-INDEX, CALL , TDRP20' . 

IF OM-3 EQUALS 5 AND OM-5 IS LESS THAN TODAYS-DT 

MOVE 12A TO CURR-INDEX, CALL 'TDRPSO' 

MOVE A TO 

OM-3 

PERFORM PR-PUN. - 

MOVE 125 TO CURR-INDEX, CALL , TDRP20'. 

ir OM-6 EQUALS 8 AND OM-7 IS LESS THAN TODAYS-DT 

MOVE 125 TO CURR-INDEX, CALL 'TDRP30 1 

MOVE 7 TO 

OM-6 

PERFORM PR-PUN. 

MOVE 126 TO CURR-INDEX, CALL 'TDRP20\ 

IF OM-6 EQUALS 8 AND OM-8 IS LESS THAN TODAYS-DT 

MOVE 126 TO CURR-INDEX, CALL 'TDRP30' 

MOVE 9 TO 

OM-6 

PERFORM PR-PUN. 

MOVE 12 7 TO CURR-INDEX, CALL , TDRP20'. 

IF OM-JOB IS UNEQUAL TO SPACES 

MOVE 12 7 TO CURR-INDEX, CALL 'TDRPSO 1 

WRITE OT-MSTR 

ADD 1 TO MSTR-1100-CT. 

MOVE 128 TO CURR-INDEX, CALL ! TDRP20 ! . 
GO TO CK-WHIP. 



MOVE 129 TO CURR-INDEX, CALL ! TDRP20 f . 
MOVE IN-MSTR TO OT-MSTR. 
MOVE 3 TO CK-CD. 



MOVE 130 TO CURR-INDEX, CALL , TDRP20 I . 
MOVE 131 TO CURR-INDEX, CALL , TDRP20 I 
IF H-ONE IS UNEQUAL TO SPACES 
MOVE 131 TO CURR-INDEX, CALL , TDRP30 I 
PERFORM MOV-H1. 

MOVE 132 TO CURR-INDEX, CALL , TDRP20\ 
IF H-TWO IS UNEQUAL TO SPACES 
MOVE 132 TO CURR-INDEX, CALL 'TDRPSO' 
PERFORM MOV-H2. 

MOVE 133 TO CURR-INDEX, CALL ! TDRP20\ 

IF H-THREE IS UNEQUAL TO SPACES 

MOVE 133 TO CURR-INDEX, CALL 'TDRPSO 1 

PERFORM MOV-H3 THRU EX-1. 

MOVE 13A TO CURR-INDEX, CALL , TDRP20 f . 

IF H-FOUR IS UNEQUAL TO SPACES 

MOVE 13A TO CURR-INDEX, CALL 'TDRPSO 1 
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PERFORM MOV-H4 THRU EX-1. 

MOVE 135 TO CURR-INDEX, CALL f TDRP20 1 . 

IF OM-JOB IS UNEQUAL TO SPACES 

MOVE 135 TO CURR- INDEX, CALL 1 TDRP30 f 

PERFORM PR-PUN. 

MOVE 136 TO CURR-INDEX, CALL, f TDRP20 1 . 
IF OM-J1B EQUALS 3 
MOVE 136 TO CURR-INDEX, 
MOVE OM-J1C TO OM-36A 
MOVE OM-J1A TO 
OM-36B 
ELSE 

MOVE 137 TO CURR-INDEX, 
MOVE OM-JOB TO OM-36. 
MOVE 138 TO CURR-INDEX, 
MOVE SPACES TO H-ONE H-TWO H- THREE H-FOUR. 
MOVE 139 TO CURR-INDEX, CALL f TDRP20' 
IF OM-JOB IS UNEQUAL TO SPACES 
MOVE 139 TO CURR-INDEX, CALL 
WRITE OT-MSTR 
ADD 1 TO MSTR-11O0-CT. 
CK-WHIP. 

MOVE 140 TO CURR-INDEX, CALL 
MOVE 141 TO CURR-INDEX, CALL 
IF OM-JOB EQUALS HOLD-W-JOB 
MOVE 141 TO CURR-INDEX, CALL 
GO TO MAT-WHIP. 

MOVE 142 TO CURR-INDEX, CALL 
IF HOLD-W-JOB IS LESS THAN OM-JOB 
MOVE 142 TO CURR-INDEX, CALL f TDRP30 1 
GO TO LO-WHIP. 
MOVE 143 TO CURR-INDEX, 
MOVE SPACES TO OT-MSTR. 
MOVE 144 TO CURR-INDEX, 
IF CK-CD EQUALS 1 
MOVE 144 TO CURR-INDEX, 
GO TO MOV1. 

MOVE 145 TO CURR-INDEX, 
IF CK-CD EQUALS 3 
MOVE 145 TO CURR-INDEX, 
ALTER 1-SWXXX 
TO PROCEED TO MOVl 
ELSE 

MOVE 146 TO CURR-INDEX, CALL f TDRP20 f 

ALTER 1-SWXXX 

TO PROCEED TO CK-MSTR. 

MOVE 147 TO CURR-INDEX, CALL f TDRP20 f . 
MOVE 0 TO CK-CD. 
GO TO READ-M. 
MAT-WHIP. 

MOVE 148 TO CURR-INDEX, CALL f TDRP20 f . 
MOVE SPACES TO OT-MSTR. 

MOVE 149 TO CURR-INDEX, CALL 'TDRP20 f 
IF CK-CD EQUALS 1 

MOVE 149 TO CURR-INDEX, CALL f TDRP30 f 

ALTER 2-SWXXX 

TO PROCEED TO MOVl 

GO TO 

WHIP-MOV. 

MOVE 150 TO CURR-INDEX, CALL f TDRP20 f « 
IF CK-CD EQUALS 3 

MOVE 150 TO CURR-INDEX, CALL f TDRP30 f 

ALTER 2-SWXXX 

TO PROCEED TO MOVl 



CALL f TDRP 30 f 



CALL ! TDRP20 f 



CALL f TDRP20 f . 



1 TDRP 30 f 



'TDRP20 f . 
f TDRP20 f 

f TDRP 30 ' 

f TDRP20 1 , 



CALL 1 TDRP20 f . 



CALL f TDRP20 1 



CALL 'TDRPSO' 



CALL f TDRP20 1 



CALL ' TDRP30 1 
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CALL 'TDRP20 1 



CALL , TDRP20' 



ELSE 

MOVE 151 TO CURR-INDEX, 

ALTER 2-SWXXX 

TO PROCEED TO CK-MSTR. 

MOVE 152 TO CURR-INDEX, 

ALTER 1-SWXXX 

TO PROCEED TO WHIP-MOV. 

MOVE ZERO TO CK-CD. 

CO TO READ-M. 
LO-WH1P. 

MOVE 153 TO CURR- INDEX, 

MOVE 154 TO CURR-INDEX, 

IF TBLE-CTR IS LESS THAN 999 

MOVE 154 TO CURR-INDEX, CALL 

ADD 001 TO TBLE-CTR. 

MOVE 155 TO CURR-INDEX, CALL 

MOVE HOLD-W-JOB TO TAB-JOB (TBLE-CTR) 
LO-W-RTN. 

MOVE 156 TO CURR-INDEX, 

ALTER 2-SWXXX 

TO PROCEED TO CK-WHIP. 

GO TO WHIP-MOV . 
CLOS-MSTR. 

MOVE 157 TO CURR-INDEX, 

MOVE 158 TO CURR-INDEX, 



CALL , TDRP20' 
CALL 'TDRP20' 



f TDRP30 T 



f TDRP20 1 , 



CALL f TDRP20 1 



, TDRP20 1 , 
, TDRP20 f 



CALL 
CALL 

IF S-REC EQUALS HIGH-VALUES AND . CR-IN EQUALS HIGH-VALUES 
MOVE 158 TO CURR-INDEX, CALL 1 TDRP30 1 

GO 

TO FIN-CL. 

MOVE 159 TO CURR-INDEX, CALL 

MOVE HIGH-VALUES TO IN-MSTR. 

GO TO 1-SW. 
CLOS-WHIP. 

MOVE 160 TO CURR-INDEX, CALL 

MOVE 161 TO CURR-INDEX, CALL 

IF S-REC EQUALS HIGH-VALUES AND IN-MSTR EQUALS HI£H-VALUES 

MOVE 161 TO CURR-INDEX, CALL 'TDRP30' <' 

GO TO FIN-CL. 

MOVE 162 TO CURR-INDEX, 

MOVE HIGH-VALUES TO CR-IN, 

GO TO 1A-SW. 
CLOS-DET. 



'TDRP20' . 



, TDRP20 ? . 
? TDRP20 ? 



CALL , TDRP20 1 



MOVE 
MOVE 



163 TO CURR-INI&X, 
16A TO CURR-INDEX, 



CALL 1 TDRP20 1 . 
CALL 'TDRP20 1 

IF IN-MSTR EQUALS HIGH-VALUES AND CR-IN EQUALS HIGH-VALUES 
MOVE 164 TO CURR-INDEX, CALL 1 TDRP30 1 
GO TO FIN-CL-DET. 

MOVE 165 TO CURR-INDEX, CALL , TDRP20 1 . 
MOVE HIGH-VALUES TO S-REC. 
GO TO 3-SW. 
FIN-CL-DET. 

MOVE 166 TO CURR-INDEX, CALL 1 TDRP20 1 . 
MOVE 167 TO CURR-INDEX, CALL 'TDRP20 ' 

IF H-ONE EQUALS SPACES AND H-TWO EQUALS SPACES AND H-THREE 
EQUALS SPACES AND H-FOUR EQUALS SPACES 
MOVE 167 TO CURR-INDEX, CALL , TDRP30 t 
GO TO FIN-CL. 

MOVE 168 TO CURR-INDfeX, CALL 'TDRP20 1 . 
IF H-ONE EQUALS HIGH-V|£UES 

MOVE 168 TO CURR-INDEX, CALL , TDRP30 1 * 

GO V TO FIN-CL. ^ 

MOVE 169 TO CURR-INDEX-TCALL , TDRP20 t . 

PERFORM UNEVEN. 
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FIN-CL. 

MOVE 170 TO CURR- INDEX, CALL ' TDRP20 4 . 

CLOSE MSTR-IN CR-FILE MSTR-OT PUNCH-OT. 

DISPLAY ? 1100 MASTER RODS 1 MSTR-1100-CT UPON PRINTER. 

MOVE 99 TO LIN-CTR. 

MOVE ZEROES TO FG-CTR. 



WR-WHIP. 










MOVE 


171 


TO CURR- INDEX, 


CALL 


'TDRP20 


MOVE 


172 


TO CURR-INDEX, 


CALL 


1 TDRP20 


IF CTR- 


1 EQUALS TBLE-CTR 






MOVE 


172 


TO CURR-INDEX, 


CALL 


1 TDRP30 


GO TO LAST- 


■CLOS. 






MOVE 


173 


TO CURR-INDEX, 


CALL 


1 TDRP20 


IF LIN- 


CTR 


IS GREATER THAN 50 




MOVE 


173 


TO CURR-INDEX, 


CALL 


1 TDRP30 



PERFORM WHIP-HD. 

MOVE 174 TO CURR-INDEX, CALL 1 TDRP20 1 . 

ADD 001 TO CTR-1. 

MOVE TAB- JOB (CTR-1) TO PW-J0B1. 

MOVE 175 TO CURR-INDEX, CALL 1 TDRP20 1 

IF CTR-1 EQUALS TBLE-CTR 

MOVE 175 TO CURR-INDEX, CALL 1 TDRP30 ' 

GO TO LAST-CLOS. 

MOVE 176 TO CURR-INDEX, CALL ' TDRP20 ' . 

ADD 001 TO CTR-1. i 

MOVE TAB-JOB (CTR-1) TO PW- J0B2 . 

MOVE 177 TO CURR-INDEX, CALL 'TDRP20 ' ' 

IF CTR-1 EQUALS TBLE-CTR ^ 

MOVE 177 TO CURR-INDEX, CALL/ ^TDRP30 ' 

GO TO LAST-CLOS. 

MOVE 178 TO CURR-INDEX, CAI<L_7rDRP20 ' . 

ADD 001 TO CTR-1. — ^ 

MOVE TAB- JOB (CTR-1) TO PW-J0B3. 

MOVE 179 TO CURR-INDEX, CALL 1 TDRP20 ' 

IF CTR-1 EQUALS TBLE-CTR 

MOVE 179 TO CURR-INDEX, CALL 'TDRPSO' 

GO TO LAST-CLOS. 

MOVE 180 TO CURR-INDEX, CALL , TDRP20' . 

ADD 001 TO CTR-1. 

MOVE TAB- JOB (CTR-1) TO PW-J0B4 . 

MOVE 181 TO CURR-INDEX, CALL 'TDRP20' 

IF CTR-1 EQUALS TBLE-CTR 

MOVE 181 TO CURR-INDEX, CALL ' TDRP30 ' 

GO TO LAST-CLOS. 

MOVE 182 TO CURR-INDEX, CALL 'TDRP20' . 

ADD 001 TO CTR-1. 

MOVE TAB-JOB (CTR-1) TO PW-J0B5. 

MOVE 183 TO CURR-INDEX, CALL ' TDRP20 1 

IF CTR-1 EQUALS TBLE-CTR 

MOVE 183 TO CURR-INDEX, CALL 'TDRP30' 

GO TO LAST-CLOS. 

MOVE 184 TO CURR-INDEX, CALL ' TDRP20 ' . 

ADD 001 TO CTR-1. 

MOVE TAB-JOB (CTR-1) TO PW-J0B6. 

MOVE 185 TO CURR-INDEX, CALL 'TDRP20 1 

IF CTR-1 EQUALS TBLE-CTR 

MOVE 185 TO CURR-INDEX, CALL 1 TDRP30 1 

GO TO LAST-CLOS. 

MOVE 186 TO CURR-INDEX, CALL 'TDRP20 1 . 

ADD 001 TO CTR-1. 

MOVE TAB- JOB (CTR-1) TO PW-J0B7. 

MOVE 187 TO CURR-INDEX, CALL 1 TDRP20 1 

IF CfR-1 EQUALS TBLE-CTR 

MOvfe 187 TO CURR-INDEX, CALL 1 TDRP30 1 
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GO TO LAST-CLOS. 

MOVE 188 TO CURR-INDEX, CALL f TDRP20 ? . 

ADD 001 TO CTR-1. 

MOVE TAB-JOB (CTR-1) TO PW-J0B8 , 

MOVE 189 TO CURR- INDEX, CALL 1 TDRP20* 

IF CTR-1 EQUALS TBLE-CTR 

MOVE 189 TO CURR-INDEX? CALL 'TDRPSO* 

GO TO LAST-CLOS. 

MOVE 190 TO CURR-INDEX, CALL 1 TDRP20' . 
ADD 001 TO CTR-1. 
MOVE TAB-JOB (CTR-1) to PW-J0B9. 
WRITE P-REC BEFORE ADVANCING 2 LINES. 
MOVE SPACES TO P-REC. 
ADD 02 TO LIN-CTR. 
GO TO WR-WHIP. 
LAST-CLOb. 

MOVE 191 TO CURR-INDEX, CALL , TDRP20* . 
WRITE P-REC BEFORE 01. 
CLOSE P-FILE. 
MOVE +1000 TO REC-CNT 
GO TO WRITE-TDRP-FILE. 
WHIP-HD. 

MOVE 192 TO CURR-INDEX, CALL 'TORINO' . 
MOVE SPACES TO P-REC. 

WRITE P-REC BEFORE ADVANCING NEXT-PG. 
ADD 0001 TO PG-CTR. 

MOVE 'LISTING OF JOBS ON LINK NO/ JOB ORDER FILE BUT NOT ON 
'MAST* TO P-HD2-W. 

MOVE 'ER JOB ORDER FILE FOR ' TO P-HD2A-W. 

MOVE TX)DAYS-DT TO P-DT-W. 

MOVE 'PAGE 1 TO P-W-PG. 

MOVE PG-CTR TO P-W-ID. 

WRITE P-REC BEFORE ADVANCING 2 LINES. 

MOVE SPACES TO P-REC. 

MOVE 1 JOB-NO JOB-NO JOB-NO JOB-NO - J 

•OB • TO P-HD2. 
. MOVE *-N0 JOB-NO JOB-NO JOB-NO JOB-NO 

• ' TO P-HD2A. 

WRITE P-REC BEFORE ADVANCING 2 LINES. 
MOVE SPACES TO P-REC. 
MOVE 04 TO LIN-CTR. 
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** TDRP REFORMATTED REPORT PROGRAM ** 

CK-MSTR. 

IF IM-JOB EQUALS HOLD-JOB 
GO TO EVEN-MAT . 

IF IM-JOB IS GREATER THAN HOLD-JOB 
MOVE 1 TO CK-CD 
GO TO 
UNEVEN. 

MOVE IN-MSTR TO OT-MSTR. 

IF OM-3 EQUALS 5 AND OM-4 IS LESS THAN TODAYS-DT 

MOVE 3 TO 

OM-3 

PERFORM PR-PUN. 

IF OM-3 EQUALS 5 AND OM-5 IS LESS THAN TODAYS-DT 
MOVE 4 TO ' 
OM-3 

PERFORM PR-PUN. 

IF OM-6 EQUALS 8 AND OM-7 IS LESS THAN TODAYS-DT 

MOVE 7 T^ 

OM-6 

PERFORM PR-PUN. 

IF OM-6 EQUALS 8 AND OM-8' IS LESS THAN TODAYS-DT 

MOVE 9 TO 

OM-6 

PERFORM PR-PUN. 

IF OM-JOB IS UNEQUAL TO SPACES 

WRITE OT-MSTR 

ADD 1 TO MSTR-llOO-CT. 

GO TO CK-WHIP. 
EVEN-MAT. 

MOVE IN-MSTR TO OT-MSTR. 

MOVE 3 TO CK-CD. 
UNEVEN . 



EXECUTION 
COUNT 

57983 
1485 

56498 
81 
81 

56417 
56417 
0 

0 

56417 
0 

0 

56417 
0 

0 

56417 
0 

0 

56417 
56417 
56417 
56417 

1485 
1485 



TRUE 
PATH 

1485 

81 



56417 



FALSE 
PATH 

56498 

56417 



56417 



564 17 



56417 



56417 



IF H-ONE IS UNEQUAL TO SPACES 


1566 


495 


1071 


PERFORM MOV-H1. 


495 






IF H-TWO IS UNEQUAL TO SPACES 


1566 


516 


1050 


PERFORM MOV-H2. 


516 






IF H-THREE IS UNEQUAL TO SPACES 


1566 


115 


1451 


PERFORM MOV-H3 THRU EX-1. 


115 






IF H-FOUR IS UNEQUAL TO SPACES 


1566 


497 


1069 


PERFORM M0V-H4 THRU EX-1. 


497 






IF OM-JOB IS UNEQUAL TO SPACES 


1566 


1566 


0 


PERFORM PR-PUN. 


1566 






IF OM-J1B EQUALS 3 


1566 




1002 


MOVE OM-J1C TO OM-36A . 


56': 






MOVE OM-J1A TO 


; 0 4 






0M-36B 








ELSE 








MOVE OM-JOB TO OM-36. 


1002 






MOVE SPACES TO H-ONE H-TWO H-THREE H-FOUR 


1566 






IF OM-JOB IS UNEQUAL TO SPACES 


1566 


'1566 


0 


WRITE OT-MSTR 


1566, 






ADD 1 TO MSTR-llOO-CT. 


1566 


) 




fHIP. 








IF OM-JOB EQUALS HOLD-W-JOB 


62018 


15166 


46852 


GO TO MAT-WHIP. 


15166 






IF HOLD-W-JOB IS LESS THAiC OM-JOB 


46852 


4035 


42817 


GO TO LO-WHIP. 


4035 






MOVE SPACES TO OT-MSTR. 


42817 






IF CK-CD EQUALS 1 


42817 


0 


42817 


GO TO MOV1. 


0 






IF CK-CD EQUALS 3 


42817 


661 


42156 


ALTER 1-SW 


t;61 






TO PROCEED TO MO VI 
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*- i DRP REFORMATTED REPORT PROGRAM ** 



EXECUTION TRUE FALSE 
COUNT PATH PATH 



ELSE 

ALTER l-SW 

TO PROCEED TO CK-MSTR . 
MOVE 0 TO CK-CD. 
CO TO READ-M. 
MAT-WHIP. 

MOVE SPACES TO OT-MSTR. 

IF CK-CD EQUALS I 

ALTER 2-SW 

TO PROCEED TO MOV1 

CO TO 

WH IP-MO V. 

IF CK-CD EQUALS 3 

ALTER 2-SW 

TO PROCEED TO MOV1 

ELSE 

ALTER 2-SW 

TO PROCEED TO CK-MSTR. 
ALTER l-SW 

TO PROCEED TO WHIP-MOV. 
MOVE ZERO TO CK-CD. 
GO TO READ-M. 
LO-WHIP. 

IF TBLE-CTR IS LESS THAN 999 
ADD 001 TO TBLE-CTR. 

MOVE HOLD-W-JOB TO TAB-JOB (TBLE-CTR) . 
LO-W-RTN. 

ALTER 2-SW 

TO PROCEED TO CK-WHIP. 
CO TO WHIP-MOV. 
CLOS-MSTR. 

IF S-REC EQUALS HIGH- VALUES AND CR-IN EQUALS H I GH-\ Ai.Ufcb 
GO 

TO FIN-CL. 

MOVE HIGH- VALUES TO IN-MSTR. 
GO TO l-SW. 
CLOS-WHIP. 

IF S-REC EQUALS HIGH-VALUES AND IN-MSTR EQUALS HIGH- VALUES 
GO TO FIN-CL. 

MOVE HIGH-VALUES TO CR-IN . 
GO TO 1A-SW. 
CLO S— DET 

IF IN-MSTR EQUALS HICH- VALUES AND CR-IN EQUALS HIGH- VALUES 
GO TO FIN-CL- DET. 
MOVE HIGH- VALUES TO S-REC. 
GO TO 3-SW. 
FIN-CL-DET. 

IF H-ONE EQUALS SPACES AND H- TWO EQUALS SPACES AND H-THREE 
EQUALS SPACES AND H-FOUR EQUALS SPACES 
GO TO FIN-CL. 

IF H-ONE EQUALS HIGH- VALUES 
GO TO FIN-CL. 
PERFORM UNEVEN. 
FIN-CL. 

CLOSE MSTR-IN CR-FILE MSTP-OT PUNCH-OT. 
DISPLAY '1100 MASTER RCDS ' MSTR-110C-CT UPON PRINTER. 
MOVE 99 TO LIN-CTR. 
MOVE ZEROES TO PG-CTR. 
WR-WHIP. 

IF CTR-1 EQUALS TBLE-CTR 
GO TO LAST-CLOS. 



42 156 

42817 
42817 

15166 
15166 
81 

81 

15085 
824 



14261 

15085 

15085 
15085 

4035 
999 
4035 

4035 

4035 

1 

1 

0 
0 

1 
0 
1 
I 

1 

0 

1 
1 



0 
0 
0 

0 

1 
1 
1 
1 

112 
I 



999 



15085 



14261 



3036 



111 
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Figure 3. TDRP Output Report 
(Continued) 
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** TDRP REFORMATTED REPORT PROGRAM ** 



IF LIN-CTR IS GREATER THAN 50 

PERFORM WHIF-HD. 

ADD 001 TO CTR-1 . 

MOVE TAB- JOB (CTR-1) TO PW-JOB1. 

IF CTR-1 EQUALS TBI.E-CTR 

CO TO LAST-CLOS. 

ADD 001 TO CTR-i. 

MOVE TAB-JOB (CTR-1) TO PW-JOB2 . 

(F CTR-1 EQUALS TBLE-CTR 

CO TO LAST-CLOS. 

A I'D 00 1 TO CTR-1 . 

MOVE TAB- JOB (CTR-1) TO PW-JOB'3 . 

IF CTR-l EQUALS TBLE-CTR 

U) TO LAST-CLOS. 

ADD 001 TO CTR-1. 

M"VE TAB- JOB (CTR-1) TO PW-JOB4 . 

IF CTR-1 EQUALS TBLE-CTR 

CO TO LAST-CLOS. 

ADD 001 TO CTR-1. 

MOVE TAB- JOB (CTR-1) TO PW- JOBS . 

IF CTR-1 EQUALS TBLE-CTR 

CO TO LAST-CLOS. 

ADD 001 TO CTR-1 . 

MOVE TAB- JOB (CTR-1 ) TO PW-JO;-.. 

1 !•■ CTR-1 EQUALS TBLE-CIE 

;;u TO LAST-CLOS. 

,\Dl) 00 1 TO CTR-1 . 

MOVE TAB-JOB (CTR-1) TO ?'.:-JOB7. 

IF CTR-1 EQUALS TBLE-CTri 

CO TO LAST-CLOS. 

ADD 00 1 TO CTR-1 . 

MOVE TAB- JOB (CTR-1) TO PW-J.iBH. 

IF CTR-1 EQUALS TBLE-CTR 

CO To LAST-CLOS. 

ADD 00 I TO CTR-1 . 

MuVL TAB- JOB (CTR-1) TO PW-JOB9 . 

WRITE P-REC riEFORE ADVANCING 2 LINES. 

MOVE SPACES TO P-REC. 

ADD 02 TO LIN-CTR. 

CO TO WR-WUIP. 
T-CLOS . 

WRIT E P-REC BEFORE 0 1 . 

CLOSE P-FTLE . 

STOP RUN . 
P-Hl). 

MOVE SPACES TO P-REC . 

WRITE P-REC BEFORE ADVANCING NEXT-PC 
ADD 0001 TO PO-CTR. 

MOVE 'LISTING OF JOBS ON LINK NO/ JOB ORDER FILE BUT NOT ON 
'MAST' TO P-HD2-W. 



MOVE ' E R JOB ORDER FILE FOR 
MOVE TODAY S-DT TO P-DT-W. 



•10 VE ' PAGE 



TO P-W-PC 



MOVE P 
WRITE 



-CTR TO P-W-ID. 



-RFC BEFORE ADVANCING 
MOVE SPACES TO P-REC. 
MOVE 1 JOB-NO JOB-NO 
\)B 1 To P-HD2. 
MOVE ' -NO JOB-NO 

1 TO P-HD2A. 
WRITE P-RFC BEFORE ADVANCING 
MOVE SPACE TO P-REC. 
MOVE OA TO LIN-CTR. 



TO P-HD2A-W. 



LINES. 



JOB-NO 



JOB-NO 



JOB-NO 



2 LINES . 



J OB -NO 



JOB-NO 



Figure 3. TDRP Output Report: 
(Continued) 
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EXECUTION 
COUNT 



234 



TRUE 
PATH 



FALSE 
PATH 

106 



111 



111 



111 



111 



111 



111 



111 



ill 



Compile and execution times will be affected by 
the code insertions. The increase in compile and 
execution times fcr three sample test cases are 
e v i den t i n F i gu r e 4 . 



COM? ILK I'IMES IN SECONDS 
BEFORE AFTER Z INCREASE 



TEST 
TEST 
TEST 



18. 74 
27.71 
3 1 . 4 1 



26.7 
39.4 
40.4 



42 
42 
21 



EXECUTION TIMES IN SECONDS 
BEFORE AFTER Z INCREASE 



TEST 
TEST 
TEST 



1998.4 
2 92.2 

824 . 9 



3010. 3 
39 3. 7 
993.') 



51 

34 
20 



Figure 4. Sample compile and 
execution times 



The additional memory requirement for 
execution is less than 2K words which include 
object modules and data Duffer hut not inser- 
ted COBOL statements. 

Lastly, only one data file may be reduced 
per TDRP run. All original files must be used in 
a TDRP run even though reduced files may exist to 
ensure that all logic paths executed for the 
original data file are exercised. In addition., 
if a reduced file is run through TDRP to be 
reduced a second time, tests have shown testing 
coverage is affected. It should be noted that 
Che s;itu: 1 -1 of testing coverage is uncertain 
when several reduced files are used in testing 
since the original intention of TDRP was to 
reduce a single master file to conserve computer 
resources over multiple runs (2). 

6. Examples 

The following procedure was used as a 
guideline for each test case run. Initially, 
the program was run with the original data base. 
Then the TDRP run was made producing the reduced 
data base. A ninety percent reduction in the 
original data base was user-specified. Lastly, 
cht/ program was run with the reduced data base 
to ascertain the savings in CPU time. 

In the first test (Figure 5), two master 
files were reduced. The savings in storage and 
CPE time become quite apparent when both reduced 
files are used to run a te-jt. 



TEST 1 
PA-MSTR FILE 



NUMBER OF 
RECORDS 
(STORAGE) 
CPU TIME 
(SECONDS) 



ORIGINAL 
DATA FILE 

4S450 

/ : .w.71 



AFTER 
REDUCTION 

4548 

670. 72 



PERCENTAGE 
SAVINGS 

89.99 

1 1 .48 



DU-MSTR FILE 



NUMBER OF 
RECORDS 
(STORAGE) 
CPU TIME 
(SECONDS) 



ORIGINAL 
DATA FILE 

28670 

757.71 



AFTER PERCENTAGE 
REDUCTION SAVINGS 



2868 
245. 16 



90.00 
67.64 



COMBINED PA-MSTR AND DU-MSTR FILES 



ORIGINAL A/TER PERCENTAGE 
DATA FILES REDUCTION SAVINGS 



TOTAL NO. 
OF RECORDS 
(STORAGE) 
CPU TIME 
(SECONDS) 



74 120 
757.71 



74 16 
157 .49 



89.99 
79.22 



Figure 5. Results from Test I 



In tests 2 and 3 (Figure 6), only one master file 
was being processed. In both test cases, the 
savings in CPU time are considerate. 



TEST 2 
PW115M01-1N 



NUMBER OF 
RECORDS 
(STORAGE) 
CPU TIME 

(SECONDS) 



ORIGINAL 
DATA FILE 

1 302 I 

98.29 



AFTER 
REDUCTION 



131* 



16.04 



PERCENTAGE 
SAVINGS 

89.91 

83.68 



TEST 3 
PWJ4 3DOI-IN 



NUMBER OF 
RECORDS 
(STORAGE) 
CPU TIME 
(SECUNDS) 



ORIGINAL 
DATA FILE 

138 ?'< 

257 . 86 



AFTER 
REDUCTION 

1386 

54 .08 



PERCENTAGE 
SAVINGS 

89.98 

79.03 



Figure 6. Results from Tests 2 and 3 
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7. Constraints 



TDRP currently handles up to 1000 condi- 
tional statements but may easily be modified to 
handle more. Files for reduction should not be 
used in a COBOL sort since the program needs to 
instrument the "read" statement for a file in 
order to extract the records. Lastly, at 
present, data files with control or dependent 
records will yield unpredictable results after 
reduction. This control record problem has been 
analyzed and modifications are currently being 
unit tested. 
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